attest-detector 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 nehabetai1809
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,270 @@
1
+ Metadata-Version: 2.1
2
+ Name: attest-detector
3
+ Version: 0.1.0
4
+ Summary: Detect what percentage of a text was written by AI, with per-sentence and per-paragraph breakdowns.
5
+ Author: nehabetai1809
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/nehabetai1809/attest-detector
8
+ Project-URL: Repository, https://github.com/nehabetai1809/attest-detector
9
+ Project-URL: Issues, https://github.com/nehabetai1809/attest-detector/issues
10
+ Keywords: ai detection,text analysis,nlp,chatgpt,ai generated text
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Topic :: Text Processing :: Linguistic
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Intended Audience :: Science/Research
17
+ Requires-Python: >=3.8
18
+ Description-Content-Type: text/markdown
19
+ License-File: LICENSE
20
+ Requires-Dist: transformers>=4.35.0
21
+ Requires-Dist: torch>=2.0.0
22
+ Requires-Dist: nltk>=3.8.0
23
+ Requires-Dist: numpy>=1.24.0
24
+ Requires-Dist: langdetect>=1.0.9
25
+ Provides-Extra: pdf
26
+ Requires-Dist: pypdf>=4.0.0; extra == "pdf"
27
+ Provides-Extra: docx
28
+ Requires-Dist: python-docx>=1.1.0; extra == "docx"
29
+ Provides-Extra: all-formats
30
+ Requires-Dist: pypdf>=4.0.0; extra == "all-formats"
31
+ Requires-Dist: python-docx>=1.1.0; extra == "all-formats"
32
+ Provides-Extra: dev
33
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
34
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
35
+
36
+ # attest-detector
37
+
38
+ [![Tests](https://github.com/nehabetai1809/attest-detector/actions/workflows/ci.yml/badge.svg)](https://github.com/nehabetai1809/attest-detector/actions/workflows/ci.yml)
39
+ [![PyPI version](https://badge.fury.io/py/attest-detector.svg)](https://pypi.org/project/attest-detector/)
40
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
41
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
42
+
43
+ **Detect what percentage of a piece of text was written by AI.**
44
+
45
+ `attest-detector` is an offline-friendly Python package that analyzes text using **three different AI detection models** and reports each model's score separately — so you can make your own informed judgement rather than trusting a single black-box score.
46
+
47
+ ## Why three models?
48
+
49
+ No single AI detection model is reliable on its own for modern AI text. After testing, here's what we found:
50
+
51
+ | Model | Catches AI text | False positives on human text | Best used for |
52
+ |---|---|---|---|
53
+ | `fakespot` | ~100% | ~66% | Flagging potential AI writing |
54
+ | `roberta` | ~40% | ~6% | Confirming text is human-written |
55
+ | `chatgpt` | ~3% | ~0% | "Definitely human" confirmation |
56
+
57
+ Running all three together and reading them side by side gives a much more honest picture than any single score.
58
+
59
+ > ⚠️ **First run:** Each model (~500MB) is downloaded automatically and cached locally. Subsequent runs are fully offline.
60
+
61
+ ---
62
+
63
+ ## Installation
64
+
65
+ ```bash
66
+ pip install attest-detector
67
+ ```
68
+
69
+ With PDF support:
70
+ ```bash
71
+ pip install attest-detector[pdf]
72
+ ```
73
+
74
+ With Word (.docx) support:
75
+ ```bash
76
+ pip install attest-detector[docx]
77
+ ```
78
+
79
+ ---
80
+
81
+ ## Quick Start
82
+
83
+ ```python
84
+ from ai_detector import Detector
85
+
86
+ d = Detector() # runs all three models by default
87
+ result = d.analyze("Your text goes here...")
88
+
89
+ print(result.summary())
90
+ ```
91
+
92
+ ### Example Output
93
+
94
+ ```
95
+ ============================================================
96
+ Overall: 48.8% likely AI-written [Human]
97
+ Primary model: fakespot
98
+ ============================================================
99
+
100
+ Per-model scores (full document):
101
+ fakespot 100.0% [✓ AI]
102
+ roberta 40.3% [✗ Human]
103
+ chatgpt 2.9% [✗ Human]
104
+
105
+ Models flagging as AI: 1 / 3
106
+
107
+ Analyzed 2 sentence(s) across 1 paragraph(s).
108
+
109
+ Per-paragraph breakdown:
110
+ Para 1: 48.8% [█████████░░░░░░░░░░░] [Human] [1/3 models]
111
+ ```
112
+
113
+ ### How to read the results
114
+
115
+ - **All three models flag AI** → Strong signal the text is AI-written
116
+ - **Only fakespot flags AI** → Uncertain — fakespot has many false positives
117
+ - **All three say Human** → Very likely human-written
118
+ - **roberta and chatgpt both say Human** → Strong signal text is human-written
119
+
120
+ ---
121
+
122
+ ## Per-Sentence and Per-Paragraph Breakdown
123
+
124
+ ```python
125
+ # Per-sentence scores
126
+ for sentence in result.sentences:
127
+ print(f"{sentence.ai_score:.1%} [{sentence.label}] {sentence.text}")
128
+ print(f" Model scores: {sentence.model_scores}")
129
+
130
+ # Per-paragraph scores
131
+ for para in result.paragraphs:
132
+ print(f"{para.ai_score:.1%} [{para.label}]")
133
+ print(f" {para.models_flagging_ai()}/{len(para.model_scores)} models flagged as AI")
134
+ ```
135
+
136
+ ---
137
+
138
+ ## Single Model Mode
139
+
140
+ ```python
141
+ # Use a specific model instead of all three
142
+ d = Detector(model='fakespot') # best at catching AI
143
+ d = Detector(model='roberta') # best at confirming human
144
+ d = Detector(model='chatgpt') # near-zero false positives
145
+ ```
146
+
147
+ ---
148
+
149
+ ## Adjusting Sensitivity
150
+
151
+ ```python
152
+ # More sensitive — flags more text as AI
153
+ d = Detector(threshold=0.3)
154
+
155
+ # Less sensitive — only flags strongly AI text
156
+ d = Detector(threshold=0.7)
157
+ ```
158
+
159
+ ---
160
+
161
+ ## CLI Usage
162
+
163
+ After installation, use `attest-detect` directly in your terminal — no Python needed:
164
+
165
+ ```bash
166
+ # Analyze a text file (runs all three models)
167
+ attest-detect essay.txt
168
+
169
+ # Analyze a PDF
170
+ attest-detect paper.pdf
171
+
172
+ # Analyze a Word document
173
+ attest-detect report.docx
174
+
175
+ # Analyze text directly
176
+ attest-detect --text "Paste your text here"
177
+
178
+ # Use a specific model
179
+ attest-detect essay.txt --model fakespot
180
+
181
+ # Show per-sentence breakdown
182
+ attest-detect essay.txt --sentences
183
+
184
+ # Adjust sensitivity
185
+ attest-detect essay.txt --threshold 0.3
186
+
187
+ # List available models
188
+ attest-detect --list-models
189
+ ```
190
+
191
+ ---
192
+
193
+ ## API Reference
194
+
195
+ ### `Detector(model=None, threshold=0.5, device=None)`
196
+
197
+ | Parameter | Type | Default | Description |
198
+ |---|---|---|---|
199
+ | `model` | str | `None` | Run a single model: `'roberta'`, `'chatgpt'`, `'fakespot'`. If not set, runs all three. |
200
+ | `threshold` | float | `0.5` | AI cutoff. Lower = more sensitive |
201
+ | `device` | str | `None` | `'cpu'`, `'cuda'`, `'mps'`. Auto-detected |
202
+
203
+ ### `DetectionResult`
204
+
205
+ | Attribute/Method | Type | Description |
206
+ |---|---|---|
207
+ | `ai_score` | float | Primary model's overall AI probability (0–1) |
208
+ | `label` | str | `'AI'` or `'Human'` |
209
+ | `model_scores` | dict | Every model's score e.g. `{'fakespot': 1.0, 'roberta': 0.4, 'chatgpt': 0.03}` |
210
+ | `sentences` | list | Per-sentence `Segment` list |
211
+ | `paragraphs` | list | Per-paragraph `Segment` list |
212
+ | `models_flagging_ai()` | int | Count of models that scored above threshold |
213
+ | `summary()` | str | Formatted text summary |
214
+
215
+ ### `Segment`
216
+
217
+ | Attribute/Method | Type | Description |
218
+ |---|---|---|
219
+ | `text` | str | Original text |
220
+ | `ai_score` | float | Primary model's AI probability |
221
+ | `model_scores` | dict | Every model's score for this segment |
222
+ | `label` | str | `'AI'` or `'Human'` |
223
+ | `segment_type` | str | `'sentence'` or `'paragraph'` |
224
+ | `models_flagging_ai(threshold)` | int | Count of models above threshold |
225
+
226
+ ---
227
+
228
+ ## Available Models
229
+
230
+ | Key | Model | Trained On | Size |
231
+ |---|---|---|---|
232
+ | `fakespot` | `fakespot-ai/roberta-base-ai-text-detection-v1` | Modern AI text | ~500MB |
233
+ | `roberta` | `openai-community/roberta-base-openai-detector` | GPT-2 output (2019) | ~500MB |
234
+ | `chatgpt` | `Hello-SimpleAI/chatgpt-detector-roberta` | ChatGPT output (2023) | ~500MB |
235
+
236
+ ---
237
+
238
+ ## Limitations
239
+
240
+ - No single model reliably detects all modern AI text — use all three together
241
+ - All models are English-only
242
+ - Very short texts (< 2 sentences) may produce unreliable scores
243
+ - AI detection is probabilistic — treat results as signals, not verdicts
244
+ - Paraphrased or lightly edited AI text may score lower than expected
245
+
246
+ ---
247
+
248
+ ## Running Tests
249
+
250
+ ```bash
251
+ pip install -e ".[dev]"
252
+ pytest tests/ -v
253
+ ```
254
+
255
+ ---
256
+
257
+ ## Contributing
258
+
259
+ Contributions are welcome! Please open an issue first to discuss changes.
260
+
261
+ 1. Fork the repo
262
+ 2. Create a feature branch (`git checkout -b feature/my-feature`)
263
+ 3. Commit your changes
264
+ 4. Push and open a Pull Request
265
+
266
+ ---
267
+
268
+ ## License
269
+
270
+ MIT — see [LICENSE](LICENSE) for details.
@@ -0,0 +1,235 @@
1
+ # attest-detector
2
+
3
+ [![Tests](https://github.com/nehabetai1809/attest-detector/actions/workflows/ci.yml/badge.svg)](https://github.com/nehabetai1809/attest-detector/actions/workflows/ci.yml)
4
+ [![PyPI version](https://badge.fury.io/py/attest-detector.svg)](https://pypi.org/project/attest-detector/)
5
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
7
+
8
+ **Detect what percentage of a piece of text was written by AI.**
9
+
10
+ `attest-detector` is an offline-friendly Python package that analyzes text using **three different AI detection models** and reports each model's score separately — so you can make your own informed judgement rather than trusting a single black-box score.
11
+
12
+ ## Why three models?
13
+
14
+ No single AI detection model is reliable on its own for modern AI text. After testing, here's what we found:
15
+
16
+ | Model | Catches AI text | False positives on human text | Best used for |
17
+ |---|---|---|---|
18
+ | `fakespot` | ~100% | ~66% | Flagging potential AI writing |
19
+ | `roberta` | ~40% | ~6% | Confirming text is human-written |
20
+ | `chatgpt` | ~3% | ~0% | "Definitely human" confirmation |
21
+
22
+ Running all three together and reading them side by side gives a much more honest picture than any single score.
23
+
24
+ > ⚠️ **First run:** Each model (~500MB) is downloaded automatically and cached locally. Subsequent runs are fully offline.
25
+
26
+ ---
27
+
28
+ ## Installation
29
+
30
+ ```bash
31
+ pip install attest-detector
32
+ ```
33
+
34
+ With PDF support:
35
+ ```bash
36
+ pip install attest-detector[pdf]
37
+ ```
38
+
39
+ With Word (.docx) support:
40
+ ```bash
41
+ pip install attest-detector[docx]
42
+ ```
43
+
44
+ ---
45
+
46
+ ## Quick Start
47
+
48
+ ```python
49
+ from ai_detector import Detector
50
+
51
+ d = Detector() # runs all three models by default
52
+ result = d.analyze("Your text goes here...")
53
+
54
+ print(result.summary())
55
+ ```
56
+
57
+ ### Example Output
58
+
59
+ ```
60
+ ============================================================
61
+ Overall: 48.8% likely AI-written [Human]
62
+ Primary model: fakespot
63
+ ============================================================
64
+
65
+ Per-model scores (full document):
66
+ fakespot 100.0% [✓ AI]
67
+ roberta 40.3% [✗ Human]
68
+ chatgpt 2.9% [✗ Human]
69
+
70
+ Models flagging as AI: 1 / 3
71
+
72
+ Analyzed 2 sentence(s) across 1 paragraph(s).
73
+
74
+ Per-paragraph breakdown:
75
+ Para 1: 48.8% [█████████░░░░░░░░░░░] [Human] [1/3 models]
76
+ ```
77
+
78
+ ### How to read the results
79
+
80
+ - **All three models flag AI** → Strong signal the text is AI-written
81
+ - **Only fakespot flags AI** → Uncertain — fakespot has many false positives
82
+ - **All three say Human** → Very likely human-written
83
+ - **roberta and chatgpt both say Human** → Strong signal text is human-written
84
+
85
+ ---
86
+
87
+ ## Per-Sentence and Per-Paragraph Breakdown
88
+
89
+ ```python
90
+ # Per-sentence scores
91
+ for sentence in result.sentences:
92
+ print(f"{sentence.ai_score:.1%} [{sentence.label}] {sentence.text}")
93
+ print(f" Model scores: {sentence.model_scores}")
94
+
95
+ # Per-paragraph scores
96
+ for para in result.paragraphs:
97
+ print(f"{para.ai_score:.1%} [{para.label}]")
98
+ print(f" {para.models_flagging_ai()}/{len(para.model_scores)} models flagged as AI")
99
+ ```
100
+
101
+ ---
102
+
103
+ ## Single Model Mode
104
+
105
+ ```python
106
+ # Use a specific model instead of all three
107
+ d = Detector(model='fakespot') # best at catching AI
108
+ d = Detector(model='roberta') # best at confirming human
109
+ d = Detector(model='chatgpt') # near-zero false positives
110
+ ```
111
+
112
+ ---
113
+
114
+ ## Adjusting Sensitivity
115
+
116
+ ```python
117
+ # More sensitive — flags more text as AI
118
+ d = Detector(threshold=0.3)
119
+
120
+ # Less sensitive — only flags strongly AI text
121
+ d = Detector(threshold=0.7)
122
+ ```
123
+
124
+ ---
125
+
126
+ ## CLI Usage
127
+
128
+ After installation, use `attest-detect` directly in your terminal — no Python needed:
129
+
130
+ ```bash
131
+ # Analyze a text file (runs all three models)
132
+ attest-detect essay.txt
133
+
134
+ # Analyze a PDF
135
+ attest-detect paper.pdf
136
+
137
+ # Analyze a Word document
138
+ attest-detect report.docx
139
+
140
+ # Analyze text directly
141
+ attest-detect --text "Paste your text here"
142
+
143
+ # Use a specific model
144
+ attest-detect essay.txt --model fakespot
145
+
146
+ # Show per-sentence breakdown
147
+ attest-detect essay.txt --sentences
148
+
149
+ # Adjust sensitivity
150
+ attest-detect essay.txt --threshold 0.3
151
+
152
+ # List available models
153
+ attest-detect --list-models
154
+ ```
155
+
156
+ ---
157
+
158
+ ## API Reference
159
+
160
+ ### `Detector(model=None, threshold=0.5, device=None)`
161
+
162
+ | Parameter | Type | Default | Description |
163
+ |---|---|---|---|
164
+ | `model` | str | `None` | Run a single model: `'roberta'`, `'chatgpt'`, `'fakespot'`. If not set, runs all three. |
165
+ | `threshold` | float | `0.5` | AI cutoff. Lower = more sensitive |
166
+ | `device` | str | `None` | `'cpu'`, `'cuda'`, `'mps'`. Auto-detected |
167
+
168
+ ### `DetectionResult`
169
+
170
+ | Attribute/Method | Type | Description |
171
+ |---|---|---|
172
+ | `ai_score` | float | Primary model's overall AI probability (0–1) |
173
+ | `label` | str | `'AI'` or `'Human'` |
174
+ | `model_scores` | dict | Every model's score e.g. `{'fakespot': 1.0, 'roberta': 0.4, 'chatgpt': 0.03}` |
175
+ | `sentences` | list | Per-sentence `Segment` list |
176
+ | `paragraphs` | list | Per-paragraph `Segment` list |
177
+ | `models_flagging_ai()` | int | Count of models that scored above threshold |
178
+ | `summary()` | str | Formatted text summary |
179
+
180
+ ### `Segment`
181
+
182
+ | Attribute/Method | Type | Description |
183
+ |---|---|---|
184
+ | `text` | str | Original text |
185
+ | `ai_score` | float | Primary model's AI probability |
186
+ | `model_scores` | dict | Every model's score for this segment |
187
+ | `label` | str | `'AI'` or `'Human'` |
188
+ | `segment_type` | str | `'sentence'` or `'paragraph'` |
189
+ | `models_flagging_ai(threshold)` | int | Count of models above threshold |
190
+
191
+ ---
192
+
193
+ ## Available Models
194
+
195
+ | Key | Model | Trained On | Size |
196
+ |---|---|---|---|
197
+ | `fakespot` | `fakespot-ai/roberta-base-ai-text-detection-v1` | Modern AI text | ~500MB |
198
+ | `roberta` | `openai-community/roberta-base-openai-detector` | GPT-2 output (2019) | ~500MB |
199
+ | `chatgpt` | `Hello-SimpleAI/chatgpt-detector-roberta` | ChatGPT output (2023) | ~500MB |
200
+
201
+ ---
202
+
203
+ ## Limitations
204
+
205
+ - No single model reliably detects all modern AI text — use all three together
206
+ - All models are English-only
207
+ - Very short texts (< 2 sentences) may produce unreliable scores
208
+ - AI detection is probabilistic — treat results as signals, not verdicts
209
+ - Paraphrased or lightly edited AI text may score lower than expected
210
+
211
+ ---
212
+
213
+ ## Running Tests
214
+
215
+ ```bash
216
+ pip install -e ".[dev]"
217
+ pytest tests/ -v
218
+ ```
219
+
220
+ ---
221
+
222
+ ## Contributing
223
+
224
+ Contributions are welcome! Please open an issue first to discuss changes.
225
+
226
+ 1. Fork the repo
227
+ 2. Create a feature branch (`git checkout -b feature/my-feature`)
228
+ 3. Commit your changes
229
+ 4. Push and open a Pull Request
230
+
231
+ ---
232
+
233
+ ## License
234
+
235
+ MIT — see [LICENSE](LICENSE) for details.
@@ -0,0 +1,23 @@
1
+ """
2
+ attest-detector
3
+ ---------------
4
+ Detect what percentage of a text was written by AI.
5
+ Provides overall score, per-sentence, and per-paragraph breakdowns.
6
+
7
+ Basic usage:
8
+ from ai_detector import Detector
9
+
10
+ d = Detector()
11
+ result = d.analyze("Your text goes here...")
12
+
13
+ print(result.ai_score) # e.g. 0.82 → 82% likely AI
14
+ print(result.summary()) # human-readable summary
15
+ for s in result.sentences:
16
+ print(s.text, s.ai_score) # per-sentence scores
17
+ """
18
+
19
+ from .detector import Detector
20
+ from .result import DetectionResult, Segment
21
+
22
+ __version__ = "0.1.0"
23
+ __all__ = ["Detector", "DetectionResult", "Segment"]