mafex 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
mafex-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Muhammet Anıl Yağız
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
mafex-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,195 @@
1
+ Metadata-Version: 2.4
2
+ Name: mafex
3
+ Version: 0.1.0
4
+ Summary: Morpheme-Aligned Faithful Explanations for Turkish NLP
5
+ Author-email: Muhammet Anıl Yağız <213255046@kku.edu.tr>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/anilyagiz/mafex
8
+ Project-URL: Documentation, https://github.com/anilyagiz/mafex#readme
9
+ Project-URL: Repository, https://github.com/anilyagiz/mafex
10
+ Keywords: nlp,xai,explainability,turkish,morphology,interpretability
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.8
15
+ Classifier: Programming Language :: Python :: 3.9
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Requires-Python: >=3.8
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: torch>=2.0.0
25
+ Requires-Dist: transformers>=4.35.0
26
+ Requires-Dist: numpy>=1.24.0
27
+ Requires-Dist: tqdm>=4.65.0
28
+ Provides-Extra: full
29
+ Requires-Dist: captum>=0.6.0; extra == "full"
30
+ Requires-Dist: shap>=0.42.0; extra == "full"
31
+ Requires-Dist: matplotlib>=3.7.0; extra == "full"
32
+ Requires-Dist: seaborn>=0.12.0; extra == "full"
33
+ Requires-Dist: scipy>=1.10.0; extra == "full"
34
+ Requires-Dist: pandas>=2.0.0; extra == "full"
35
+ Provides-Extra: dev
36
+ Requires-Dist: pytest>=7.4.0; extra == "dev"
37
+ Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
38
+ Requires-Dist: black>=23.0.0; extra == "dev"
39
+ Requires-Dist: isort>=5.12.0; extra == "dev"
40
+ Dynamic: license-file
41
+
42
+ # MAFEX - Morpheme-Aligned Faithful Explanations
43
+
44
+ > **Beyond the Token: Correcting the Tokenization Bias in XAI via Morphologically-Aligned Projection**
45
+
46
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
47
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
48
+
49
+ ## Overview
50
+
51
+ MAFEX is a framework for generating faithful, interpretable explanations for Large Language Models (LLMs) in **Morphologically Rich Languages (MRLs)** like Turkish.
52
+
53
+ Current XAI methods operate on tokens, which fragment semantic units in agglutinative languages. MAFEX corrects this **Tokenization-Morphology Misalignment (TMM)** by projecting attributions onto linguistically meaningful morphemes.
54
+
55
+ ### Key Equation
56
+
57
+ ```
58
+ φ_morph = A · φ_tok (Morphological Projection)
59
+ S* = λ·φ_morph + (1-λ)·φ_causal (Causal Regularization)
60
+ ```
61
+
62
+ Where:
63
+ - **A** ∈ {0,1}^{K×T} is the Alignment Matrix mapping T tokens to K morphemes
64
+ - **φ_tok** is token-level attribution (e.g., Integrated Gradients)
65
+ - **λ** controls the gradient/causal trade-off (default: 0.7)
66
+
67
+ ## Installation
68
+
69
+ ```bash
70
+ # Install from PyPI
71
+ pip install mafex
72
+
73
+ # Or install latest from GitHub
74
+ pip install git+https://github.com/anilyagiz/mafex.git
75
+ ```
76
+
77
+ ## Quick Start
78
+
79
+ ### 1. Morphological Analysis
80
+
81
+ ```python
82
+ from mafex.morphology import MorphemeAnalyzer
83
+
84
+ analyzer = MorphemeAnalyzer()
85
+
86
+ # Analyze Turkish word
87
+ analysis = analyzer.analyze_word("gelemedim") # "I could not come"
88
+ print(analysis.morpheme_surfaces) # ['gel', 'eme', 'di', 'm']
89
+ ```
90
+
91
+ ### 2. Run MAFEX Explanation
92
+
93
+ ```python
94
+ from mafex.models import DemoModelWrapper
95
+ from mafex.projection import MAFEXPipeline
96
+
97
+ # Load model
98
+ wrapper = DemoModelWrapper()
99
+ wrapper.load()
100
+
101
+ # Create MAFEX pipeline
102
+ mafex = MAFEXPipeline(
103
+ wrapper.model,
104
+ wrapper.tokenizer,
105
+ lambda_causal=0.7
106
+ )
107
+
108
+ # Generate explanation
109
+ result = mafex.explain("Gelemedim")
110
+
111
+ # Get top attributed morphemes
112
+ print(result.get_top_morphemes(3))
113
+ # [('-eme', 0.62), ('gel', 0.21), ('-di', 0.12)]
114
+ ```
115
+
116
+ ### 3. Command Line
117
+
118
+ ```bash
119
+ # Single explanation
120
+ python run_mafex.py --model demo --text "Gelemedim"
121
+
122
+ # Evaluation
123
+ python run_mafex.py --model berturk --eval --samples 10
124
+ ```
125
+
126
+ ## Project Structure
127
+
128
+ ```
129
+ mafex/
130
+ ├── mafex/
131
+ │ ├── __init__.py # Package exports
132
+ │ ├── morphology.py # Morphological analysis & alignment
133
+ │ ├── attribution.py # IG, SHAP, DeepLIFT baselines
134
+ │ ├── projection.py # MAFEX pipeline & causal regularization
135
+ │ ├── models.py # Model wrappers (BERTurk, Cosmos, etc.)
136
+ │ └── visualization.py # Plotting utilities
137
+ ├── evaluation/
138
+ │ ├── __init__.py
139
+ │ └── metrics.py # ERASER metrics
140
+ ├── benchmark/
141
+ │ ├── __init__.py
142
+ │ └── trust_tr.py # Trust-TR benchmark (N=850)
143
+ ├── notebooks/
144
+ │ └── demo.ipynb # Interactive demonstration
145
+ ├── demo.py # CLI demo script
146
+ ├── run_mafex.py # Main runner
147
+ ├── config.yaml # Configuration
148
+ └── requirements.txt # Dependencies
149
+ ```
150
+
151
+ ## Supported Models
152
+
153
+ | Model | Type | Status |
154
+ |-------|------|--------|
155
+ | BERTurk | Encoder | ✅ Tested |
156
+ | YTÜ-Cosmos | Decoder | ⚠️ Pending |
157
+ | Kumru | Decoder | ⚠️ Pending |
158
+ | Aya-23 | Decoder | ⚠️ Pending |
159
+
160
+ ## Evaluation Metrics
161
+
162
+ MAFEX is evaluated using ERASER metrics:
163
+
164
+ - **Comprehensiveness**: Does removing important features hurt performance?
165
+ - **Sufficiency**: Are important features alone enough to maintain performance?
166
+
167
+ Expected results:
168
+
169
+ | Model | Token-IG | Random | MAFEX | Δ |
170
+ |-------|----------|--------|-------|---|
171
+ | BERTurk | 0.42 | 0.50 | 0.68 | +62% |
172
+ | Cosmos | 0.39 | 0.47 | 0.65 | +67% |
173
+ | Kumru | 0.45 | 0.53 | 0.71 | +58% |
174
+ | Aya-23 | 0.41 | 0.49 | 0.69 | +68% |
175
+
176
+ ## Citation
177
+
178
+ ```bibtex
179
+ @article{yagiz2024mafex,
180
+ title={Beyond the Token: Correcting the Tokenization Bias in XAI via Morphologically-Aligned Projection},
181
+ author={Yağız, Muhammet Anıl},
182
+ journal={arXiv preprint},
183
+ year={2024}
184
+ }
185
+ ```
186
+
187
+ ## License
188
+
189
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
190
+
191
+ ## Acknowledgments
192
+
193
+ - [Zemberek](https://github.com/ahmetaa/zemberek-nlp) for Turkish morphological analysis
194
+ - [Captum](https://captum.ai/) for attribution methods
195
+ - [dbmdz/BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) for the Turkish BERT model
mafex-0.1.0/README.md ADDED
@@ -0,0 +1,154 @@
1
+ # MAFEX - Morpheme-Aligned Faithful Explanations
2
+
3
+ > **Beyond the Token: Correcting the Tokenization Bias in XAI via Morphologically-Aligned Projection**
4
+
5
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+
8
+ ## Overview
9
+
10
+ MAFEX is a framework for generating faithful, interpretable explanations for Large Language Models (LLMs) in **Morphologically Rich Languages (MRLs)** like Turkish.
11
+
12
+ Current XAI methods operate on tokens, which fragment semantic units in agglutinative languages. MAFEX corrects this **Tokenization-Morphology Misalignment (TMM)** by projecting attributions onto linguistically meaningful morphemes.
13
+
14
+ ### Key Equation
15
+
16
+ ```
17
+ φ_morph = A · φ_tok (Morphological Projection)
18
+ S* = λ·φ_morph + (1-λ)·φ_causal (Causal Regularization)
19
+ ```
20
+
21
+ Where:
22
+ - **A** ∈ {0,1}^{K×T} is the Alignment Matrix mapping T tokens to K morphemes
23
+ - **φ_tok** is token-level attribution (e.g., Integrated Gradients)
24
+ - **λ** controls the gradient/causal trade-off (default: 0.7)
25
+
26
+ ## Installation
27
+
28
+ ```bash
29
+ # Install from PyPI
30
+ pip install mafex
31
+
32
+ # Or install latest from GitHub
33
+ pip install git+https://github.com/anilyagiz/mafex.git
34
+ ```
35
+
36
+ ## Quick Start
37
+
38
+ ### 1. Morphological Analysis
39
+
40
+ ```python
41
+ from mafex.morphology import MorphemeAnalyzer
42
+
43
+ analyzer = MorphemeAnalyzer()
44
+
45
+ # Analyze Turkish word
46
+ analysis = analyzer.analyze_word("gelemedim") # "I could not come"
47
+ print(analysis.morpheme_surfaces) # ['gel', 'eme', 'di', 'm']
48
+ ```
49
+
50
+ ### 2. Run MAFEX Explanation
51
+
52
+ ```python
53
+ from mafex.models import DemoModelWrapper
54
+ from mafex.projection import MAFEXPipeline
55
+
56
+ # Load model
57
+ wrapper = DemoModelWrapper()
58
+ wrapper.load()
59
+
60
+ # Create MAFEX pipeline
61
+ mafex = MAFEXPipeline(
62
+ wrapper.model,
63
+ wrapper.tokenizer,
64
+ lambda_causal=0.7
65
+ )
66
+
67
+ # Generate explanation
68
+ result = mafex.explain("Gelemedim")
69
+
70
+ # Get top attributed morphemes
71
+ print(result.get_top_morphemes(3))
72
+ # [('-eme', 0.62), ('gel', 0.21), ('-di', 0.12)]
73
+ ```
74
+
75
+ ### 3. Command Line
76
+
77
+ ```bash
78
+ # Single explanation
79
+ python run_mafex.py --model demo --text "Gelemedim"
80
+
81
+ # Evaluation
82
+ python run_mafex.py --model berturk --eval --samples 10
83
+ ```
84
+
85
+ ## Project Structure
86
+
87
+ ```
88
+ mafex/
89
+ ├── mafex/
90
+ │ ├── __init__.py # Package exports
91
+ │ ├── morphology.py # Morphological analysis & alignment
92
+ │ ├── attribution.py # IG, SHAP, DeepLIFT baselines
93
+ │ ├── projection.py # MAFEX pipeline & causal regularization
94
+ │ ├── models.py # Model wrappers (BERTurk, Cosmos, etc.)
95
+ │ └── visualization.py # Plotting utilities
96
+ ├── evaluation/
97
+ │ ├── __init__.py
98
+ │ └── metrics.py # ERASER metrics
99
+ ├── benchmark/
100
+ │ ├── __init__.py
101
+ │ └── trust_tr.py # Trust-TR benchmark (N=850)
102
+ ├── notebooks/
103
+ │ └── demo.ipynb # Interactive demonstration
104
+ ├── demo.py # CLI demo script
105
+ ├── run_mafex.py # Main runner
106
+ ├── config.yaml # Configuration
107
+ └── requirements.txt # Dependencies
108
+ ```
109
+
110
+ ## Supported Models
111
+
112
+ | Model | Type | Status |
113
+ |-------|------|--------|
114
+ | BERTurk | Encoder | ✅ Tested |
115
+ | YTÜ-Cosmos | Decoder | ⚠️ Pending |
116
+ | Kumru | Decoder | ⚠️ Pending |
117
+ | Aya-23 | Decoder | ⚠️ Pending |
118
+
119
+ ## Evaluation Metrics
120
+
121
+ MAFEX is evaluated using ERASER metrics:
122
+
123
+ - **Comprehensiveness**: Does removing important features hurt performance?
124
+ - **Sufficiency**: Are important features alone enough to maintain performance?
125
+
126
+ Expected results:
127
+
128
+ | Model | Token-IG | Random | MAFEX | Δ |
129
+ |-------|----------|--------|-------|---|
130
+ | BERTurk | 0.42 | 0.50 | 0.68 | +62% |
131
+ | Cosmos | 0.39 | 0.47 | 0.65 | +67% |
132
+ | Kumru | 0.45 | 0.53 | 0.71 | +58% |
133
+ | Aya-23 | 0.41 | 0.49 | 0.69 | +68% |
134
+
135
+ ## Citation
136
+
137
+ ```bibtex
138
+ @article{yagiz2024mafex,
139
+ title={Beyond the Token: Correcting the Tokenization Bias in XAI via Morphologically-Aligned Projection},
140
+ author={Yağız, Muhammet Anıl},
141
+ journal={arXiv preprint},
142
+ year={2024}
143
+ }
144
+ ```
145
+
146
+ ## License
147
+
148
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
149
+
150
+ ## Acknowledgments
151
+
152
+ - [Zemberek](https://github.com/ahmetaa/zemberek-nlp) for Turkish morphological analysis
153
+ - [Captum](https://captum.ai/) for attribution methods
154
+ - [dbmdz/BERTurk](https://huggingface.co/dbmdz/bert-base-turkish-cased) for the Turkish BERT model
@@ -0,0 +1,35 @@
1
+ """
2
+ Evaluation package for MAFEX.
3
+ """
4
+
5
+ # Samples are always available (no torch dependency)
6
+ from .samples import (
7
+ get_test_samples,
8
+ get_negative_samples,
9
+ get_positive_samples,
10
+ TURKISH_SAMPLES
11
+ )
12
+
13
+ # Lazy import for torch-dependent modules
14
+ def __getattr__(name):
15
+ """Lazy import for metrics that require torch."""
16
+ if name in ("ERASEREvaluator", "EvaluationResult", "BenchmarkRunner",
17
+ "compute_faithfulness_correlation", "compare_methods"):
18
+ from .metrics import (
19
+ ERASEREvaluator, EvaluationResult, BenchmarkRunner,
20
+ compute_faithfulness_correlation, compare_methods
21
+ )
22
+ return locals()[name]
23
+ raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
24
+
25
+ __all__ = [
26
+ "ERASEREvaluator",
27
+ "EvaluationResult",
28
+ "BenchmarkRunner",
29
+ "compute_faithfulness_correlation",
30
+ "compare_methods",
31
+ "get_test_samples",
32
+ "get_negative_samples",
33
+ "get_positive_samples",
34
+ "TURKISH_SAMPLES"
35
+ ]