PyPI - attest-detector - Versions diffs - 0.1.0__tar.gz - Mend

attest-detector 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

attest_detector-0.1.0/LICENSE +21 -0
attest_detector-0.1.0/PKG-INFO +270 -0
attest_detector-0.1.0/README.md +235 -0
attest_detector-0.1.0/ai_detector/__init__.py +23 -0
attest_detector-0.1.0/ai_detector/cli.py +171 -0
attest_detector-0.1.0/ai_detector/detector.py +315 -0
attest_detector-0.1.0/ai_detector/models.py +150 -0
attest_detector-0.1.0/ai_detector/readers.py +108 -0
attest_detector-0.1.0/ai_detector/result.py +196 -0
attest_detector-0.1.0/ai_detector/segmenter.py +56 -0
attest_detector-0.1.0/ai_detector/utils.py +75 -0
attest_detector-0.1.0/attest_detector.egg-info/PKG-INFO +270 -0
attest_detector-0.1.0/attest_detector.egg-info/SOURCES.txt +18 -0
attest_detector-0.1.0/attest_detector.egg-info/dependency_links.txt +1 -0
attest_detector-0.1.0/attest_detector.egg-info/entry_points.txt +2 -0
attest_detector-0.1.0/attest_detector.egg-info/requires.txt +19 -0
attest_detector-0.1.0/attest_detector.egg-info/top_level.txt +1 -0
attest_detector-0.1.0/pyproject.toml +63 -0
attest_detector-0.1.0/setup.cfg +4 -0
attest_detector-0.1.0/tests/test_detector.py +283 -0

attest_detector-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 nehabetai1809
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

attest_detector-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,270 @@
+Metadata-Version: 2.1
+Name: attest-detector
+Version: 0.1.0
+Summary: Detect what percentage of a text was written by AI, with per-sentence and per-paragraph breakdowns.
+Author: nehabetai1809
+License: MIT
+Project-URL: Homepage, https://github.com/nehabetai1809/attest-detector
+Project-URL: Repository, https://github.com/nehabetai1809/attest-detector
+Project-URL: Issues, https://github.com/nehabetai1809/attest-detector/issues
+Keywords: ai detection,text analysis,nlp,chatgpt,ai generated text
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Topic :: Text Processing :: Linguistic
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: transformers>=4.35.0
+Requires-Dist: torch>=2.0.0
+Requires-Dist: nltk>=3.8.0
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: langdetect>=1.0.9
+Provides-Extra: pdf
+Requires-Dist: pypdf>=4.0.0; extra == "pdf"
+Provides-Extra: docx
+Requires-Dist: python-docx>=1.1.0; extra == "docx"
+Provides-Extra: all-formats
+Requires-Dist: pypdf>=4.0.0; extra == "all-formats"
+Requires-Dist: python-docx>=1.1.0; extra == "all-formats"
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
+# attest-detector
+[![Tests](https://github.com/nehabetai1809/attest-detector/actions/workflows/ci.yml/badge.svg)](https://github.com/nehabetai1809/attest-detector/actions/workflows/ci.yml)
+[![PyPI version](https://badge.fury.io/py/attest-detector.svg)](https://pypi.org/project/attest-detector/)
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+**Detect what percentage of a piece of text was written by AI.**
+`attest-detector` is an offline-friendly Python package that analyzes text using **three different AI detection models** and reports each model's score separately — so you can make your own informed judgement rather than trusting a single black-box score.
+## Why three models?
+No single AI detection model is reliable on its own for modern AI text. After testing, here's what we found:
+| Model | Catches AI text | False positives on human text | Best used for |
+|---|---|---|---|
+| `fakespot` | ~100% | ~66% | Flagging potential AI writing |
+| `roberta` | ~40% | ~6% | Confirming text is human-written |
+| `chatgpt` | ~3% | ~0% | "Definitely human" confirmation |
+Running all three together and reading them side by side gives a much more honest picture than any single score.
+> ⚠️ **First run:** Each model (~500MB) is downloaded automatically and cached locally. Subsequent runs are fully offline.
+---
+## Installation
+```bash
+pip install attest-detector
+```
+With PDF support:
+```bash
+pip install attest-detector[pdf]
+```
+With Word (.docx) support:
+```bash
+pip install attest-detector[docx]
+```
+---
+## Quick Start
+```python
+from ai_detector import Detector
+d = Detector()  # runs all three models by default
+result = d.analyze("Your text goes here...")
+print(result.summary())
+```
+### Example Output
+```
+============================================================
+  Overall: 48.8% likely AI-written  [Human]
+  Primary model: fakespot
+============================================================
+  Per-model scores (full document):
+    fakespot      100.0%  [✓ AI]
+    roberta        40.3%  [✗ Human]
+    chatgpt         2.9%  [✗ Human]
+  Models flagging as AI: 1 / 3
+  Analyzed 2 sentence(s) across 1 paragraph(s).
+  Per-paragraph breakdown:
+    Para 1:  48.8%  [█████████░░░░░░░░░░░]  [Human]  [1/3 models]
+```
+### How to read the results
+- **All three models flag AI** → Strong signal the text is AI-written
+- **Only fakespot flags AI** → Uncertain — fakespot has many false positives
+- **All three say Human** → Very likely human-written
+- **roberta and chatgpt both say Human** → Strong signal text is human-written
+---
+## Per-Sentence and Per-Paragraph Breakdown
+```python
+# Per-sentence scores
+for sentence in result.sentences:
+    print(f"{sentence.ai_score:.1%}  [{sentence.label}]  {sentence.text}")
+    print(f"  Model scores: {sentence.model_scores}")
+# Per-paragraph scores
+for para in result.paragraphs:
+    print(f"{para.ai_score:.1%}  [{para.label}]")
+    print(f"  {para.models_flagging_ai()}/{len(para.model_scores)} models flagged as AI")
+```
+---
+## Single Model Mode
+```python
+# Use a specific model instead of all three
+d = Detector(model='fakespot')   # best at catching AI
+d = Detector(model='roberta')    # best at confirming human
+d = Detector(model='chatgpt')    # near-zero false positives
+```
+---
+## Adjusting Sensitivity
+```python
+# More sensitive — flags more text as AI
+d = Detector(threshold=0.3)
+# Less sensitive — only flags strongly AI text
+d = Detector(threshold=0.7)
+```
+---
+## CLI Usage
+After installation, use `attest-detect` directly in your terminal — no Python needed:
+```bash
+# Analyze a text file (runs all three models)
+attest-detect essay.txt
+# Analyze a PDF
+attest-detect paper.pdf
+# Analyze a Word document
+attest-detect report.docx
+# Analyze text directly
+attest-detect --text "Paste your text here"
+# Use a specific model
+attest-detect essay.txt --model fakespot
+# Show per-sentence breakdown
+attest-detect essay.txt --sentences
+# Adjust sensitivity
+attest-detect essay.txt --threshold 0.3
+# List available models
+attest-detect --list-models
+```
+---
+## API Reference
+### `Detector(model=None, threshold=0.5, device=None)`
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `model` | str | `None` | Run a single model: `'roberta'`, `'chatgpt'`, `'fakespot'`. If not set, runs all three. |
+| `threshold` | float | `0.5` | AI cutoff. Lower = more sensitive |
+| `device` | str | `None` | `'cpu'`, `'cuda'`, `'mps'`. Auto-detected |
+### `DetectionResult`
+| Attribute/Method | Type | Description |
+|---|---|---|
+| `ai_score` | float | Primary model's overall AI probability (0–1) |
+| `label` | str | `'AI'` or `'Human'` |
+| `model_scores` | dict | Every model's score e.g. `{'fakespot': 1.0, 'roberta': 0.4, 'chatgpt': 0.03}` |
+| `sentences` | list | Per-sentence `Segment` list |
+| `paragraphs` | list | Per-paragraph `Segment` list |
+| `models_flagging_ai()` | int | Count of models that scored above threshold |
+| `summary()` | str | Formatted text summary |
+### `Segment`
+| Attribute/Method | Type | Description |
+|---|---|---|
+| `text` | str | Original text |
+| `ai_score` | float | Primary model's AI probability |
+| `model_scores` | dict | Every model's score for this segment |
+| `label` | str | `'AI'` or `'Human'` |
+| `segment_type` | str | `'sentence'` or `'paragraph'` |
+| `models_flagging_ai(threshold)` | int | Count of models above threshold |
+---
+## Available Models
+| Key | Model | Trained On | Size |
+|---|---|---|---|
+| `fakespot` | `fakespot-ai/roberta-base-ai-text-detection-v1` | Modern AI text | ~500MB |
+| `roberta` | `openai-community/roberta-base-openai-detector` | GPT-2 output (2019) | ~500MB |
+| `chatgpt` | `Hello-SimpleAI/chatgpt-detector-roberta` | ChatGPT output (2023) | ~500MB |
+---
+## Limitations
+- No single model reliably detects all modern AI text — use all three together
+- All models are English-only
+- Very short texts (< 2 sentences) may produce unreliable scores
+- AI detection is probabilistic — treat results as signals, not verdicts
+- Paraphrased or lightly edited AI text may score lower than expected
+---
+## Running Tests
+```bash
+pip install -e ".[dev]"
+pytest tests/ -v
+```
+---
+## Contributing
+Contributions are welcome! Please open an issue first to discuss changes.
+1. Fork the repo
+2. Create a feature branch (`git checkout -b feature/my-feature`)
+3. Commit your changes
+4. Push and open a Pull Request
+---
+## License
+MIT — see [LICENSE](LICENSE) for details.

attest_detector-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,235 @@
+# attest-detector
+[![Tests](https://github.com/nehabetai1809/attest-detector/actions/workflows/ci.yml/badge.svg)](https://github.com/nehabetai1809/attest-detector/actions/workflows/ci.yml)
+[![PyPI version](https://badge.fury.io/py/attest-detector.svg)](https://pypi.org/project/attest-detector/)
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
+**Detect what percentage of a piece of text was written by AI.**
+`attest-detector` is an offline-friendly Python package that analyzes text using **three different AI detection models** and reports each model's score separately — so you can make your own informed judgement rather than trusting a single black-box score.
+## Why three models?
+No single AI detection model is reliable on its own for modern AI text. After testing, here's what we found:
+| Model | Catches AI text | False positives on human text | Best used for |
+|---|---|---|---|
+| `fakespot` | ~100% | ~66% | Flagging potential AI writing |
+| `roberta` | ~40% | ~6% | Confirming text is human-written |
+| `chatgpt` | ~3% | ~0% | "Definitely human" confirmation |
+Running all three together and reading them side by side gives a much more honest picture than any single score.
+> ⚠️ **First run:** Each model (~500MB) is downloaded automatically and cached locally. Subsequent runs are fully offline.
+---
+## Installation
+```bash
+pip install attest-detector
+```
+With PDF support:
+```bash
+pip install attest-detector[pdf]
+```
+With Word (.docx) support:
+```bash
+pip install attest-detector[docx]
+```
+---
+## Quick Start
+```python
+from ai_detector import Detector
+d = Detector()  # runs all three models by default
+result = d.analyze("Your text goes here...")
+print(result.summary())
+```
+### Example Output
+```
+============================================================
+  Overall: 48.8% likely AI-written  [Human]
+  Primary model: fakespot
+============================================================
+  Per-model scores (full document):
+    fakespot      100.0%  [✓ AI]
+    roberta        40.3%  [✗ Human]
+    chatgpt         2.9%  [✗ Human]
+  Models flagging as AI: 1 / 3
+  Analyzed 2 sentence(s) across 1 paragraph(s).
+  Per-paragraph breakdown:
+    Para 1:  48.8%  [█████████░░░░░░░░░░░]  [Human]  [1/3 models]
+```
+### How to read the results
+- **All three models flag AI** → Strong signal the text is AI-written
+- **Only fakespot flags AI** → Uncertain — fakespot has many false positives
+- **All three say Human** → Very likely human-written
+- **roberta and chatgpt both say Human** → Strong signal text is human-written
+---
+## Per-Sentence and Per-Paragraph Breakdown
+```python
+# Per-sentence scores
+for sentence in result.sentences:
+    print(f"{sentence.ai_score:.1%}  [{sentence.label}]  {sentence.text}")
+    print(f"  Model scores: {sentence.model_scores}")
+# Per-paragraph scores
+for para in result.paragraphs:
+    print(f"{para.ai_score:.1%}  [{para.label}]")
+    print(f"  {para.models_flagging_ai()}/{len(para.model_scores)} models flagged as AI")
+```
+---
+## Single Model Mode
+```python
+# Use a specific model instead of all three
+d = Detector(model='fakespot')   # best at catching AI
+d = Detector(model='roberta')    # best at confirming human
+d = Detector(model='chatgpt')    # near-zero false positives
+```
+---
+## Adjusting Sensitivity
+```python
+# More sensitive — flags more text as AI
+d = Detector(threshold=0.3)
+# Less sensitive — only flags strongly AI text
+d = Detector(threshold=0.7)
+```
+---
+## CLI Usage
+After installation, use `attest-detect` directly in your terminal — no Python needed:
+```bash
+# Analyze a text file (runs all three models)
+attest-detect essay.txt
+# Analyze a PDF
+attest-detect paper.pdf
+# Analyze a Word document
+attest-detect report.docx
+# Analyze text directly
+attest-detect --text "Paste your text here"
+# Use a specific model
+attest-detect essay.txt --model fakespot
+# Show per-sentence breakdown
+attest-detect essay.txt --sentences
+# Adjust sensitivity
+attest-detect essay.txt --threshold 0.3
+# List available models
+attest-detect --list-models
+```
+---
+## API Reference
+### `Detector(model=None, threshold=0.5, device=None)`
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `model` | str | `None` | Run a single model: `'roberta'`, `'chatgpt'`, `'fakespot'`. If not set, runs all three. |
+| `threshold` | float | `0.5` | AI cutoff. Lower = more sensitive |
+| `device` | str | `None` | `'cpu'`, `'cuda'`, `'mps'`. Auto-detected |
+### `DetectionResult`
+| Attribute/Method | Type | Description |
+|---|---|---|
+| `ai_score` | float | Primary model's overall AI probability (0–1) |
+| `label` | str | `'AI'` or `'Human'` |
+| `model_scores` | dict | Every model's score e.g. `{'fakespot': 1.0, 'roberta': 0.4, 'chatgpt': 0.03}` |
+| `sentences` | list | Per-sentence `Segment` list |
+| `paragraphs` | list | Per-paragraph `Segment` list |
+| `models_flagging_ai()` | int | Count of models that scored above threshold |
+| `summary()` | str | Formatted text summary |
+### `Segment`
+| Attribute/Method | Type | Description |
+|---|---|---|
+| `text` | str | Original text |
+| `ai_score` | float | Primary model's AI probability |
+| `model_scores` | dict | Every model's score for this segment |
+| `label` | str | `'AI'` or `'Human'` |
+| `segment_type` | str | `'sentence'` or `'paragraph'` |
+| `models_flagging_ai(threshold)` | int | Count of models above threshold |
+---
+## Available Models
+| Key | Model | Trained On | Size |
+|---|---|---|---|
+| `fakespot` | `fakespot-ai/roberta-base-ai-text-detection-v1` | Modern AI text | ~500MB |
+| `roberta` | `openai-community/roberta-base-openai-detector` | GPT-2 output (2019) | ~500MB |
+| `chatgpt` | `Hello-SimpleAI/chatgpt-detector-roberta` | ChatGPT output (2023) | ~500MB |
+---
+## Limitations
+- No single model reliably detects all modern AI text — use all three together
+- All models are English-only
+- Very short texts (< 2 sentences) may produce unreliable scores
+- AI detection is probabilistic — treat results as signals, not verdicts
+- Paraphrased or lightly edited AI text may score lower than expected
+---
+## Running Tests
+```bash
+pip install -e ".[dev]"
+pytest tests/ -v
+```
+---
+## Contributing
+Contributions are welcome! Please open an issue first to discuss changes.
+1. Fork the repo
+2. Create a feature branch (`git checkout -b feature/my-feature`)
+3. Commit your changes
+4. Push and open a Pull Request
+---
+## License
+MIT — see [LICENSE](LICENSE) for details.

attest_detector-0.1.0/ai_detector/__init__.py ADDED Viewed

@@ -0,0 +1,23 @@
+"""
+attest-detector
+---------------
+Detect what percentage of a text was written by AI.
+Provides overall score, per-sentence, and per-paragraph breakdowns.
+Basic usage:
+    from ai_detector import Detector
+    d = Detector()
+    result = d.analyze("Your text goes here...")
+    print(result.ai_score)          # e.g. 0.82  → 82% likely AI
+    print(result.summary())         # human-readable summary
+    for s in result.sentences:
+        print(s.text, s.ai_score)   # per-sentence scores
+"""
+from .detector import Detector
+from .result import DetectionResult, Segment
+__version__ = "0.1.0"
+__all__ = ["Detector", "DetectionResult", "Segment"]