tokenbreak-scanner 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,33 @@
1
+ GNU AFFERO GENERAL PUBLIC LICENSE
2
+ Version 3, 19 November 2007
3
+
4
+ Copyright (C) 2026 TokenBreak Scanner Contributors
5
+
6
+ This program is free software: you can redistribute it and/or modify
7
+ it under the terms of the GNU Affero General Public License as published
8
+ by the Free Software Foundation, either version 3 of the License, or
9
+ (at your option) any later version.
10
+
11
+ This program is distributed in the hope that it will be useful,
12
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
13
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
+ GNU Affero General Public License for more details.
15
+
16
+ You should have received a copy of the GNU Affero General Public License
17
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
18
+
19
+ Also add information on how to contact you by electronic and paper mail.
20
+
21
+ If your software can interact with users remotely through a computer
22
+ network, you should also make sure that it provides a way for users to
23
+ get its source. For example, if your program is a web application, its
24
+ interface could display a β€œSource” link that leads users to an archive
25
+ of the code. There are many ways you could offer source, and different
26
+ solutions will be better for different programs; see section 13 for the
27
+ specific requirements.
28
+
29
+ You should also get your employer (if you work as a programmer) or school,
30
+ if any, to sign a β€œcopyright disclaimer” for the program, if necessary.
31
+ For more information on this, and how to apply and follow the GNU AGPL,
32
+ see <https://www.gnu.org/licenses/>.
33
+
@@ -0,0 +1,3 @@
1
+ include README.md
2
+ include LICENSE
3
+ recursive-include src *.py
@@ -0,0 +1,324 @@
1
+ Metadata-Version: 2.4
2
+ Name: tokenbreak-scanner
3
+ Version: 0.1.0
4
+ Summary: Scanner for TokenBreak vulnerabilities in NLP model artifacts
5
+ Author: TokenBreak Scanner Contributors
6
+ License: AGPL-3.0-or-later
7
+ Project-URL: Homepage, https://github.com/your-org/tokenbreak-scanner
8
+ Project-URL: Repository, https://github.com/your-org/tokenbreak-scanner
9
+ Project-URL: Documentation, https://github.com/your-org/tokenbreak-scanner#readme
10
+ Project-URL: Issues, https://github.com/your-org/tokenbreak-scanner/issues
11
+ Keywords: tokenbreak,tokenizer,security,nlp,adversarial,bpe,wordpiece,unigram,huggingface,vulnerability-scanner
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
16
+ Classifier: Topic :: Security
17
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Operating System :: OS Independent
24
+ Requires-Python: >=3.9
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Requires-Dist: transformers>=4.40.0
28
+ Requires-Dist: tokenizers>=0.19.0
29
+ Requires-Dist: pydantic>=2.0.0
30
+ Requires-Dist: click>=8.0.0
31
+ Requires-Dist: rich>=13.0.0
32
+ Requires-Dist: numpy>=1.17
33
+ Provides-Extra: attack
34
+ Requires-Dist: torch>=2.0.0; extra == "attack"
35
+ Provides-Extra: dev
36
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
37
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
38
+ Requires-Dist: httpx>=0.24.0; extra == "dev"
39
+ Dynamic: license-file
40
+
41
+ # πŸ” TokenBreak Scanner β€” NLP Model Vulnerability Auditor
42
+
43
+ Detect **TokenBreak adversarial vulnerabilities** in LLMs, classifiers, and encoders before they hit production.
44
+
45
+ [![PyPI](https://img.shields.io/pypi/v/tokenbreak-scanner)](https://pypi.org/project/tokenbreak-scanner)
46
+ [![Python](https://img.shields.io/pypi/pyversions/tokenbreak-scanner)](https://pypi.org/project/tokenbreak-scanner)
47
+ [![License](https://img.shields.io/badge/License-AGPL--3.0--or--later-blue.svg)](LICENSE)
48
+ [![Tests](https://github.com/your-org/tokenbreak-scanner/actions/workflows/ci.yml/badge.svg)](https://github.com/your-org/tokenbreak-scanner/actions)
49
+ [![Downloads](https://img.shields.io/pypi/dm/tokenbreak-scanner)](https://pypi.org/project/tokenbreak-scanner)
50
+
51
+ [πŸ“„ TokenBreak Paper](https://arxiv.org/html/2506.07948v1) Β· [⚑ Quick Start](#installation) Β· [πŸ“– Usage](#usage) Β· [πŸš€ CI Integration](#ci-integration)
52
+
53
+ ---
54
+
55
+ ## πŸ”₯ What is TokenBreak?
56
+
57
+ **TokenBreak** is an adversarial attack that bypasses text-classification and LLM guardrails by prepending a single character to targeted words. This tiny perturbation corrupts **BPE** and **WordPiece** tokenization, causing models to misclassify malicious input as benign β€” while humans and downstream LLMs still understand the original meaning.
58
+
59
+ > πŸ“„ **Paper**: [TokenBreak: Bypassing Text Classification Models Through Token Manipulation](https://arxiv.org/html/2506.07948v1)
60
+
61
+ ---
62
+
63
+ ## πŸ›‘οΈ What TokenBreak Scanner Does
64
+
65
+ | Capability | Description |
66
+ |---|---|
67
+ | **Inspect model artifacts** | Reads `config.json`, `tokenizer.json`, `tokenizer_config.json` to determine tokenizer architecture |
68
+ | **Identify tokenization algorithm** | Detects **BPE**, **WordPiece**, **Unigram**, or **SentencePiece** |
69
+ | **Assess TokenBreak vulnerability** | Flags BPE/WordPiece models as **High Risk**; Unigram as **Safe** |
70
+ | **Explain the decision** | Evidence tree showing which signals (config, runtime, source) contributed to the confidence score |
71
+ | **Recommend defense** | Suggests the Unigram pre-mapping defense (Section 5) or migration to a safer model family |
72
+ | **Live attack validation** *(optional)* | Loads model weights and runs the `BreakPrompt` algorithm to verify the bypass experimentally |
73
+
74
+ ---
75
+
76
+ ## πŸ“¦ Installation
77
+
78
+ ### From PyPI (recommended)
79
+
80
+ ```bash
81
+ pip install tokenbreak-scanner
82
+ ```
83
+
84
+ ### Optional extras
85
+
86
+ - **Attack validation** (requires PyTorch):
87
+ ```bash
88
+ pip install "tokenbreak-scanner[attack]"
89
+ ```
90
+ - **Development dependencies** (pytest, coverage):
91
+ ```bash
92
+ pip install "tokenbreak-scanner[dev]"
93
+ ```
94
+
95
+ ### From source
96
+
97
+ ```bash
98
+ git clone https://github.com/your-org/tokenbreak-scanner.git
99
+ cd tokenbreak-scanner
100
+ pip install -e ".[dev]"
101
+ ```
102
+
103
+ ---
104
+
105
+ ## πŸš€ Quick Start
106
+
107
+ ### Scan a local model directory
108
+
109
+ ```bash
110
+ tokenbreak-scan ./models/my-spam-classifier/
111
+ ```
112
+
113
+ ### Scan a HuggingFace model ID (auto-download)
114
+
115
+ ```bash
116
+ tokenbreak-scan distilbert-base-uncased --download
117
+ ```
118
+
119
+ ### Scan Qwen3-0.6B (production example)
120
+
121
+ ```bash
122
+ tokenbreak-scan Qwen/Qwen3-0.6B --download --trust-remote-code
123
+ ```
124
+
125
+ > βœ… Expected result: **HIGH risk** β€” Qwen uses BPE tokenization and is vulnerable to TokenBreak.
126
+
127
+ ### JSON output (for CI pipelines)
128
+
129
+ ```bash
130
+ tokenbreak-scan <model> --output json
131
+ ```
132
+
133
+ ### Run live attack validation
134
+
135
+ ```bash
136
+ tokenbreak-scan <model> --test-attack
137
+ ```
138
+
139
+ ---
140
+
141
+ ## πŸ§ͺ Example Output
142
+
143
+ ### Table format (vulnerable model)
144
+
145
+ ```
146
+ ======================================================================
147
+ TOKENBREAK SCANNER REPORT
148
+ ======================================================================
149
+ Model Name: distilbert-base-uncased
150
+ Model Type: distilbert
151
+ Family: DistilBERT
152
+ Tokenizer Class: DistilBertTokenizerFast
153
+ Algorithm: WordPiece
154
+ Vocab Size: 30522
155
+ Confidence: 0.85
156
+ Vulnerable: YES ⚠️
157
+ Risk Level: High
158
+ ======================================================================
159
+ Detection Sources:
160
+ 1. [tokenizer.json model.type] weight=0.40 -> WordPiece
161
+ 2. [tokenizer_config.json class] weight=0.20 -> WordPiece
162
+ 3. [config.json model_type] weight=0.15 -> WordPiece
163
+ ======================================================================
164
+ Recommendation:
165
+ Model is vulnerable to TokenBreak. Recommended: Implement the
166
+ Unigram pre-mapping defense by inserting a Unigram tokenizer
167
+ before classification and remapping tokens, or migrate to a
168
+ model family using Unigram tokenization.
169
+ ======================================================================
170
+ ```
171
+
172
+ ### JSON format
173
+
174
+ ```json
175
+ {
176
+ "model_name": "distilbert-base-uncased",
177
+ "model_type": "distilbert",
178
+ "model_family": "DistilBERT",
179
+ "tokenizer_class": "DistilBertTokenizerFast",
180
+ "tokenizer_algorithm": "WordPiece",
181
+ "vocab_size": 30522,
182
+ "confidence_score": 0.85,
183
+ "vulnerable_to_tokenbreak": true,
184
+ "risk_level": "High",
185
+ "detection_sources": [...],
186
+ "recommendation": "...",
187
+ "source": "/path/to/model"
188
+ }
189
+ ```
190
+
191
+ ---
192
+
193
+ ## πŸ”„ CI Integration
194
+
195
+ Use exit codes to gate vulnerable models in your MLOps pipeline:
196
+
197
+ | Exit code | Meaning |
198
+ |---|---|
199
+ | `0` | Model is **safe** (Unigram) or unknown |
200
+ | `1` | Model is **vulnerable** (BPE or WordPiece) β€” block deployment |
201
+ | `2` | Error (path not found, download failed, etc.) |
202
+
203
+ ### GitHub Actions example
204
+
205
+ ```yaml
206
+ - name: Scan model for TokenBreak vulnerability
207
+ run: |
208
+ pip install tokenbreak-scanner
209
+ tokenbreak-scan ./models-to-deploy/ --output json > scan-report.json
210
+ cat scan-report.json
211
+ ```
212
+
213
+ ### Python SDK example
214
+
215
+ ```python
216
+ from tokenbreak_scanner.inspector import inspect_model
217
+ from tokenbreak_scanner.models import RiskLevel
218
+
219
+ report = inspect_model(model_path, download=False)
220
+ if report.risk_level == RiskLevel.HIGH:
221
+ raise RuntimeError(f"Deployment blocked: {report.model_name} is vulnerable to TokenBreak")
222
+ ```
223
+
224
+ ---
225
+
226
+ ## πŸ”¬ How the Attack Works
227
+
228
+ 1. **Attack vector**: Prepend a single character (A–Z or a–z) to high-impact words, forcing BPE/WordPiece to split tokens differently.
229
+ 2. **Effect**: The protection model misclassifies the input (false negative), while the downstream LLM or human reviewer still understands it.
230
+ 3. **Defense**: Insert a Unigram tokenizer before the target tokenizer. Unigram tokenizes based on probability rather than left-to-right merges, making it naturally resistant to character-level perturbations.
231
+
232
+ ---
233
+
234
+ ## πŸ“Š Supported Model Families
235
+
236
+ | Family | Algorithm | TokenBreak Risk |
237
+ |---|---|---|
238
+ | GPT-2, GPT-J, GPT-Neo, GPT-NeoX | BPE | πŸ”΄ **HIGH** |
239
+ | LLaMA, Mistral, Mixtral, Falcon | BPE | πŸ”΄ **HIGH** |
240
+ | Qwen, Qwen2, Qwen3 | BPE | πŸ”΄ **HIGH** |
241
+ | Gemma, Gemma 2 | BPE | πŸ”΄ **HIGH** |
242
+ | Phi-3, Phi-4 | BPE | πŸ”΄ **HIGH** |
243
+ | BLOOM, BigScience | BPE | πŸ”΄ **HIGH** |
244
+ | Cohere, Command R | BPE | πŸ”΄ **HIGH** |
245
+ | BERT, DistilBERT, RoBERTa | WordPiece / BPE | πŸ”΄ **HIGH** |
246
+ | DeBERTa-v2, DeBERTa-v3 | Unigram | 🟒 **LOW** |
247
+ | XLM-RoBERTa | Unigram | 🟒 **LOW** |
248
+ | ALBERT | Unigram | 🟒 **LOW** |
249
+ | mT5, T5 (SentencePiece Unigram) | Unigram | 🟒 **LOW** |
250
+
251
+ > πŸ“‹ Full mapping in `src/tokenbreak_scanner/tokenizers.py`
252
+
253
+ ---
254
+
255
+ ## πŸ—οΈ Architecture
256
+
257
+ ```
258
+ tokenbreak_scanner/
259
+ β”œβ”€β”€ __init__.py # Package version
260
+ β”œβ”€β”€ cli.py # Click CLI with Rich table / JSON output
261
+ β”œβ”€β”€ inspector.py # Core introspection engine (6-signal weighted aggregation)
262
+ β”œβ”€β”€ models.py # Pydantic schemas: ScannerReport, DetectionSource, RiskLevel
263
+ β”œβ”€β”€ tokenizers.py # Tokenizer type detection, model-family mapping, runtime inspection
264
+ └── validator.py # Optional live attack validation via BreakPrompt
265
+ ```
266
+
267
+ ### Detection signals (weighted evidence tree)
268
+
269
+ | Signal | Weight | Source |
270
+ |---|---|---|
271
+ | `tokenizer.json` model type | 0.40 | HuggingFace tokenizer artifact |
272
+ | Runtime `_tokenizer.model` | 0.40 | Live Rust backend inspection |
273
+ | Source-code fingerprint | 0.30 | Python module keyword matching |
274
+ | Remote source file | 0.30 | HF Hub downloaded tokenizer module |
275
+ | `tokenizer_config.json` class | 0.20 | Config metadata |
276
+ | `config.json` model_type | 0.15 | Architecture fallback |
277
+
278
+ ---
279
+
280
+ ## πŸ§ͺ Testing
281
+
282
+ ```bash
283
+ pytest tests/ -v
284
+ ```
285
+
286
+ All 33+ tests cover:
287
+ - Local model directory scanning (BPE, WordPiece, Unigram)
288
+ - Missing `tokenizer.json` fallback
289
+ - CLI table and JSON output modes
290
+ - Tokenizer class name β†’ algorithm mapping
291
+
292
+ ---
293
+
294
+ ## 🀝 Contributing
295
+
296
+ Contributions are welcome! Please open an issue or pull request.
297
+
298
+ 1. Fork the repository
299
+ 2. Create a feature branch (`git checkout -b feat/amazing-feature`)
300
+ 3. Commit your changes (`git commit -m 'feat: add amazing feature'`)
301
+ 4. Push to the branch (`git push origin feat/amazing-feature`)
302
+ 5. Open a Pull Request
303
+
304
+ ---
305
+
306
+ ## πŸ“œ License
307
+
308
+ This project is licensed under the **GNU Affero General Public License v3.0 or later (AGPL-3.0+)**.
309
+
310
+ - βœ… Freedom to use, modify, and distribute
311
+ - πŸ”’ Copyleft: derivative works and services must also be open-sourced under AGPL
312
+ - 🌐 Network use counts as distribution β€” remote users are entitled to the source code
313
+
314
+ See [LICENSE](LICENSE) for the full text: <https://www.gnu.org/licenses/agpl-3.0.html>
315
+
316
+ ---
317
+
318
+ ## πŸ”— Related
319
+
320
+ - πŸ“„ [TokenBreak Paper (arXiv 2506.07948v1)](https://arxiv.org/html/2506.07948v1)
321
+ - 🦾 [HuggingFace Transformers](https://github.com/huggingface/transformers)
322
+ - πŸ”¬ [Adversarial Robustness Toolbox](https://github.com/Trusted-AI/adversarial-robustness-toolbox)
323
+ - πŸ›‘οΈ [OWASP Machine Learning Security Top 10](https://owasp.org/www-project-machine-learning-security-top-10/)
324
+ - 🧠 [AI Village](https://aivillage.org/)
@@ -0,0 +1,284 @@
1
+ # πŸ” TokenBreak Scanner β€” NLP Model Vulnerability Auditor
2
+
3
+ Detect **TokenBreak adversarial vulnerabilities** in LLMs, classifiers, and encoders before they hit production.
4
+
5
+ [![PyPI](https://img.shields.io/pypi/v/tokenbreak-scanner)](https://pypi.org/project/tokenbreak-scanner)
6
+ [![Python](https://img.shields.io/pypi/pyversions/tokenbreak-scanner)](https://pypi.org/project/tokenbreak-scanner)
7
+ [![License](https://img.shields.io/badge/License-AGPL--3.0--or--later-blue.svg)](LICENSE)
8
+ [![Tests](https://github.com/your-org/tokenbreak-scanner/actions/workflows/ci.yml/badge.svg)](https://github.com/your-org/tokenbreak-scanner/actions)
9
+ [![Downloads](https://img.shields.io/pypi/dm/tokenbreak-scanner)](https://pypi.org/project/tokenbreak-scanner)
10
+
11
+ [πŸ“„ TokenBreak Paper](https://arxiv.org/html/2506.07948v1) Β· [⚑ Quick Start](#installation) Β· [πŸ“– Usage](#usage) Β· [πŸš€ CI Integration](#ci-integration)
12
+
13
+ ---
14
+
15
+ ## πŸ”₯ What is TokenBreak?
16
+
17
+ **TokenBreak** is an adversarial attack that bypasses text-classification and LLM guardrails by prepending a single character to targeted words. This tiny perturbation corrupts **BPE** and **WordPiece** tokenization, causing models to misclassify malicious input as benign β€” while humans and downstream LLMs still understand the original meaning.
18
+
19
+ > πŸ“„ **Paper**: [TokenBreak: Bypassing Text Classification Models Through Token Manipulation](https://arxiv.org/html/2506.07948v1)
20
+
21
+ ---
22
+
23
+ ## πŸ›‘οΈ What TokenBreak Scanner Does
24
+
25
+ | Capability | Description |
26
+ |---|---|
27
+ | **Inspect model artifacts** | Reads `config.json`, `tokenizer.json`, `tokenizer_config.json` to determine tokenizer architecture |
28
+ | **Identify tokenization algorithm** | Detects **BPE**, **WordPiece**, **Unigram**, or **SentencePiece** |
29
+ | **Assess TokenBreak vulnerability** | Flags BPE/WordPiece models as **High Risk**; Unigram as **Safe** |
30
+ | **Explain the decision** | Evidence tree showing which signals (config, runtime, source) contributed to the confidence score |
31
+ | **Recommend defense** | Suggests the Unigram pre-mapping defense (Section 5) or migration to a safer model family |
32
+ | **Live attack validation** *(optional)* | Loads model weights and runs the `BreakPrompt` algorithm to verify the bypass experimentally |
33
+
34
+ ---
35
+
36
+ ## πŸ“¦ Installation
37
+
38
+ ### From PyPI (recommended)
39
+
40
+ ```bash
41
+ pip install tokenbreak-scanner
42
+ ```
43
+
44
+ ### Optional extras
45
+
46
+ - **Attack validation** (requires PyTorch):
47
+ ```bash
48
+ pip install "tokenbreak-scanner[attack]"
49
+ ```
50
+ - **Development dependencies** (pytest, coverage):
51
+ ```bash
52
+ pip install "tokenbreak-scanner[dev]"
53
+ ```
54
+
55
+ ### From source
56
+
57
+ ```bash
58
+ git clone https://github.com/your-org/tokenbreak-scanner.git
59
+ cd tokenbreak-scanner
60
+ pip install -e ".[dev]"
61
+ ```
62
+
63
+ ---
64
+
65
+ ## πŸš€ Quick Start
66
+
67
+ ### Scan a local model directory
68
+
69
+ ```bash
70
+ tokenbreak-scan ./models/my-spam-classifier/
71
+ ```
72
+
73
+ ### Scan a HuggingFace model ID (auto-download)
74
+
75
+ ```bash
76
+ tokenbreak-scan distilbert-base-uncased --download
77
+ ```
78
+
79
+ ### Scan Qwen3-0.6B (production example)
80
+
81
+ ```bash
82
+ tokenbreak-scan Qwen/Qwen3-0.6B --download --trust-remote-code
83
+ ```
84
+
85
+ > βœ… Expected result: **HIGH risk** β€” Qwen uses BPE tokenization and is vulnerable to TokenBreak.
86
+
87
+ ### JSON output (for CI pipelines)
88
+
89
+ ```bash
90
+ tokenbreak-scan <model> --output json
91
+ ```
92
+
93
+ ### Run live attack validation
94
+
95
+ ```bash
96
+ tokenbreak-scan <model> --test-attack
97
+ ```
98
+
99
+ ---
100
+
101
+ ## πŸ§ͺ Example Output
102
+
103
+ ### Table format (vulnerable model)
104
+
105
+ ```
106
+ ======================================================================
107
+ TOKENBREAK SCANNER REPORT
108
+ ======================================================================
109
+ Model Name: distilbert-base-uncased
110
+ Model Type: distilbert
111
+ Family: DistilBERT
112
+ Tokenizer Class: DistilBertTokenizerFast
113
+ Algorithm: WordPiece
114
+ Vocab Size: 30522
115
+ Confidence: 0.85
116
+ Vulnerable: YES ⚠️
117
+ Risk Level: High
118
+ ======================================================================
119
+ Detection Sources:
120
+ 1. [tokenizer.json model.type] weight=0.40 -> WordPiece
121
+ 2. [tokenizer_config.json class] weight=0.20 -> WordPiece
122
+ 3. [config.json model_type] weight=0.15 -> WordPiece
123
+ ======================================================================
124
+ Recommendation:
125
+ Model is vulnerable to TokenBreak. Recommended: Implement the
126
+ Unigram pre-mapping defense by inserting a Unigram tokenizer
127
+ before classification and remapping tokens, or migrate to a
128
+ model family using Unigram tokenization.
129
+ ======================================================================
130
+ ```
131
+
132
+ ### JSON format
133
+
134
+ ```json
135
+ {
136
+ "model_name": "distilbert-base-uncased",
137
+ "model_type": "distilbert",
138
+ "model_family": "DistilBERT",
139
+ "tokenizer_class": "DistilBertTokenizerFast",
140
+ "tokenizer_algorithm": "WordPiece",
141
+ "vocab_size": 30522,
142
+ "confidence_score": 0.85,
143
+ "vulnerable_to_tokenbreak": true,
144
+ "risk_level": "High",
145
+ "detection_sources": [...],
146
+ "recommendation": "...",
147
+ "source": "/path/to/model"
148
+ }
149
+ ```
150
+
151
+ ---
152
+
153
+ ## πŸ”„ CI Integration
154
+
155
+ Use exit codes to gate vulnerable models in your MLOps pipeline:
156
+
157
+ | Exit code | Meaning |
158
+ |---|---|
159
+ | `0` | Model is **safe** (Unigram) or unknown |
160
+ | `1` | Model is **vulnerable** (BPE or WordPiece) β€” block deployment |
161
+ | `2` | Error (path not found, download failed, etc.) |
162
+
163
+ ### GitHub Actions example
164
+
165
+ ```yaml
166
+ - name: Scan model for TokenBreak vulnerability
167
+ run: |
168
+ pip install tokenbreak-scanner
169
+ tokenbreak-scan ./models-to-deploy/ --output json > scan-report.json
170
+ cat scan-report.json
171
+ ```
172
+
173
+ ### Python SDK example
174
+
175
+ ```python
176
+ from tokenbreak_scanner.inspector import inspect_model
177
+ from tokenbreak_scanner.models import RiskLevel
178
+
179
+ report = inspect_model(model_path, download=False)
180
+ if report.risk_level == RiskLevel.HIGH:
181
+ raise RuntimeError(f"Deployment blocked: {report.model_name} is vulnerable to TokenBreak")
182
+ ```
183
+
184
+ ---
185
+
186
+ ## πŸ”¬ How the Attack Works
187
+
188
+ 1. **Attack vector**: Prepend a single character (A–Z or a–z) to high-impact words, forcing BPE/WordPiece to split tokens differently.
189
+ 2. **Effect**: The protection model misclassifies the input (false negative), while the downstream LLM or human reviewer still understands it.
190
+ 3. **Defense**: Insert a Unigram tokenizer before the target tokenizer. Unigram tokenizes based on probability rather than left-to-right merges, making it naturally resistant to character-level perturbations.
191
+
192
+ ---
193
+
194
+ ## πŸ“Š Supported Model Families
195
+
196
+ | Family | Algorithm | TokenBreak Risk |
197
+ |---|---|---|
198
+ | GPT-2, GPT-J, GPT-Neo, GPT-NeoX | BPE | πŸ”΄ **HIGH** |
199
+ | LLaMA, Mistral, Mixtral, Falcon | BPE | πŸ”΄ **HIGH** |
200
+ | Qwen, Qwen2, Qwen3 | BPE | πŸ”΄ **HIGH** |
201
+ | Gemma, Gemma 2 | BPE | πŸ”΄ **HIGH** |
202
+ | Phi-3, Phi-4 | BPE | πŸ”΄ **HIGH** |
203
+ | BLOOM, BigScience | BPE | πŸ”΄ **HIGH** |
204
+ | Cohere, Command R | BPE | πŸ”΄ **HIGH** |
205
+ | BERT, DistilBERT, RoBERTa | WordPiece / BPE | πŸ”΄ **HIGH** |
206
+ | DeBERTa-v2, DeBERTa-v3 | Unigram | 🟒 **LOW** |
207
+ | XLM-RoBERTa | Unigram | 🟒 **LOW** |
208
+ | ALBERT | Unigram | 🟒 **LOW** |
209
+ | mT5, T5 (SentencePiece Unigram) | Unigram | 🟒 **LOW** |
210
+
211
+ > πŸ“‹ Full mapping in `src/tokenbreak_scanner/tokenizers.py`
212
+
213
+ ---
214
+
215
+ ## πŸ—οΈ Architecture
216
+
217
+ ```
218
+ tokenbreak_scanner/
219
+ β”œβ”€β”€ __init__.py # Package version
220
+ β”œβ”€β”€ cli.py # Click CLI with Rich table / JSON output
221
+ β”œβ”€β”€ inspector.py # Core introspection engine (6-signal weighted aggregation)
222
+ β”œβ”€β”€ models.py # Pydantic schemas: ScannerReport, DetectionSource, RiskLevel
223
+ β”œβ”€β”€ tokenizers.py # Tokenizer type detection, model-family mapping, runtime inspection
224
+ └── validator.py # Optional live attack validation via BreakPrompt
225
+ ```
226
+
227
+ ### Detection signals (weighted evidence tree)
228
+
229
+ | Signal | Weight | Source |
230
+ |---|---|---|
231
+ | `tokenizer.json` model type | 0.40 | HuggingFace tokenizer artifact |
232
+ | Runtime `_tokenizer.model` | 0.40 | Live Rust backend inspection |
233
+ | Source-code fingerprint | 0.30 | Python module keyword matching |
234
+ | Remote source file | 0.30 | HF Hub downloaded tokenizer module |
235
+ | `tokenizer_config.json` class | 0.20 | Config metadata |
236
+ | `config.json` model_type | 0.15 | Architecture fallback |
237
+
238
+ ---
239
+
240
+ ## πŸ§ͺ Testing
241
+
242
+ ```bash
243
+ pytest tests/ -v
244
+ ```
245
+
246
+ All 33+ tests cover:
247
+ - Local model directory scanning (BPE, WordPiece, Unigram)
248
+ - Missing `tokenizer.json` fallback
249
+ - CLI table and JSON output modes
250
+ - Tokenizer class name β†’ algorithm mapping
251
+
252
+ ---
253
+
254
+ ## 🀝 Contributing
255
+
256
+ Contributions are welcome! Please open an issue or pull request.
257
+
258
+ 1. Fork the repository
259
+ 2. Create a feature branch (`git checkout -b feat/amazing-feature`)
260
+ 3. Commit your changes (`git commit -m 'feat: add amazing feature'`)
261
+ 4. Push to the branch (`git push origin feat/amazing-feature`)
262
+ 5. Open a Pull Request
263
+
264
+ ---
265
+
266
+ ## πŸ“œ License
267
+
268
+ This project is licensed under the **GNU Affero General Public License v3.0 or later (AGPL-3.0+)**.
269
+
270
+ - βœ… Freedom to use, modify, and distribute
271
+ - πŸ”’ Copyleft: derivative works and services must also be open-sourced under AGPL
272
+ - 🌐 Network use counts as distribution β€” remote users are entitled to the source code
273
+
274
+ See [LICENSE](LICENSE) for the full text: <https://www.gnu.org/licenses/agpl-3.0.html>
275
+
276
+ ---
277
+
278
+ ## πŸ”— Related
279
+
280
+ - πŸ“„ [TokenBreak Paper (arXiv 2506.07948v1)](https://arxiv.org/html/2506.07948v1)
281
+ - 🦾 [HuggingFace Transformers](https://github.com/huggingface/transformers)
282
+ - πŸ”¬ [Adversarial Robustness Toolbox](https://github.com/Trusted-AI/adversarial-robustness-toolbox)
283
+ - πŸ›‘οΈ [OWASP Machine Learning Security Top 10](https://owasp.org/www-project-machine-learning-security-top-10/)
284
+ - 🧠 [AI Village](https://aivillage.org/)