slopscore-lint 0.4.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (124) hide show
  1. slopscore_lint-0.4.2/.github/workflows/ci.yml +26 -0
  2. slopscore_lint-0.4.2/.github/workflows/docs.yml +21 -0
  3. slopscore_lint-0.4.2/.github/workflows/publish.yml +38 -0
  4. slopscore_lint-0.4.2/.gitignore +38 -0
  5. slopscore_lint-0.4.2/.pre-commit-config.yaml +13 -0
  6. slopscore_lint-0.4.2/.pre-commit-hooks.yaml +10 -0
  7. slopscore_lint-0.4.2/CHANGELOG.md +58 -0
  8. slopscore_lint-0.4.2/CLAUDE.md +156 -0
  9. slopscore_lint-0.4.2/DATA_SOURCES.md +40 -0
  10. slopscore_lint-0.4.2/LICENSE +21 -0
  11. slopscore_lint-0.4.2/MODEL_CARD.md +109 -0
  12. slopscore_lint-0.4.2/PKG-INFO +191 -0
  13. slopscore_lint-0.4.2/PROFILE_NOTES.md +19 -0
  14. slopscore_lint-0.4.2/README.md +132 -0
  15. slopscore_lint-0.4.2/SECURITY.md +12 -0
  16. slopscore_lint-0.4.2/action.yml +56 -0
  17. slopscore_lint-0.4.2/docs/baseline.md +17 -0
  18. slopscore_lint-0.4.2/docs/configuration.md +24 -0
  19. slopscore_lint-0.4.2/docs/index.md +35 -0
  20. slopscore_lint-0.4.2/docs/limitations.md +29 -0
  21. slopscore_lint-0.4.2/docs/suppression.md +25 -0
  22. slopscore_lint-0.4.2/eval/datasets/seed.jsonl +54 -0
  23. slopscore_lint-0.4.2/eval/results/realcorpus.json +36 -0
  24. slopscore_lint-0.4.2/mkdocs.yml +34 -0
  25. slopscore_lint-0.4.2/pyproject.toml +151 -0
  26. slopscore_lint-0.4.2/scripts/eval/_common.py +40 -0
  27. slopscore_lint-0.4.2/scripts/eval/build_seed.py +147 -0
  28. slopscore_lint-0.4.2/scripts/eval/experiment.py +117 -0
  29. slopscore_lint-0.4.2/scripts/eval/fetch.py +85 -0
  30. slopscore_lint-0.4.2/scripts/eval/train.py +107 -0
  31. slopscore_lint-0.4.2/src/slopscore/__init__.py +14 -0
  32. slopscore_lint-0.4.2/src/slopscore/cli.py +416 -0
  33. slopscore_lint-0.4.2/src/slopscore/config.py +49 -0
  34. slopscore_lint-0.4.2/src/slopscore/config_file.py +111 -0
  35. slopscore_lint-0.4.2/src/slopscore/core.py +118 -0
  36. slopscore_lint-0.4.2/src/slopscore/data/lexicons/markers.yaml +145 -0
  37. slopscore_lint-0.4.2/src/slopscore/data/model/slopscore-v0.3.json +58 -0
  38. slopscore_lint-0.4.2/src/slopscore/data/patterns/attribution/overattribution.yaml +28 -0
  39. slopscore_lint-0.4.2/src/slopscore/data/patterns/attribution/weasel.yaml +33 -0
  40. slopscore_lint-0.4.2/src/slopscore/data/patterns/claims/unsupported_universal.yaml +40 -0
  41. slopscore_lint-0.4.2/src/slopscore/data/patterns/copula/copula.yaml +28 -0
  42. slopscore_lint-0.4.2/src/slopscore/data/patterns/formulaic.yaml +64 -0
  43. slopscore_lint-0.4.2/src/slopscore/data/patterns/parallelism/parallelism.yaml +39 -0
  44. slopscore_lint-0.4.2/src/slopscore/data/patterns/prompt_residue.yaml +48 -0
  45. slopscore_lint-0.4.2/src/slopscore/data/patterns/significance/legacy.yaml +64 -0
  46. slopscore_lint-0.4.2/src/slopscore/data/patterns/suggestions/replacements.yaml +65 -0
  47. slopscore_lint-0.4.2/src/slopscore/detectors/__init__.py +23 -0
  48. slopscore_lint-0.4.2/src/slopscore/detectors/base.py +37 -0
  49. slopscore_lint-0.4.2/src/slopscore/document.py +58 -0
  50. slopscore_lint-0.4.2/src/slopscore/eval/__init__.py +11 -0
  51. slopscore_lint-0.4.2/src/slopscore/eval/datasets.py +51 -0
  52. slopscore_lint-0.4.2/src/slopscore/eval/fairness.py +50 -0
  53. slopscore_lint-0.4.2/src/slopscore/eval/harness.py +74 -0
  54. slopscore_lint-0.4.2/src/slopscore/eval/metrics.py +102 -0
  55. slopscore_lint-0.4.2/src/slopscore/eval/selective.py +39 -0
  56. slopscore_lint-0.4.2/src/slopscore/eval/span_metrics.py +38 -0
  57. slopscore_lint-0.4.2/src/slopscore/features/__init__.py +17 -0
  58. slopscore_lint-0.4.2/src/slopscore/features/_nlp.py +46 -0
  59. slopscore_lint-0.4.2/src/slopscore/features/_ruleset.py +82 -0
  60. slopscore_lint-0.4.2/src/slopscore/features/base.py +49 -0
  61. slopscore_lint-0.4.2/src/slopscore/features/cadence.py +37 -0
  62. slopscore_lint-0.4.2/src/slopscore/features/formatting.py +47 -0
  63. slopscore_lint-0.4.2/src/slopscore/features/formulaic_patterns.py +43 -0
  64. slopscore_lint-0.4.2/src/slopscore/features/human_signals.py +64 -0
  65. slopscore_lint-0.4.2/src/slopscore/features/lexical_markers.py +105 -0
  66. slopscore_lint-0.4.2/src/slopscore/features/phrase_packs.py +54 -0
  67. slopscore_lint-0.4.2/src/slopscore/features/prompt_residue.py +38 -0
  68. slopscore_lint-0.4.2/src/slopscore/features/redundancy.py +42 -0
  69. slopscore_lint-0.4.2/src/slopscore/features/specificity.py +44 -0
  70. slopscore_lint-0.4.2/src/slopscore/features/suggestions.py +67 -0
  71. slopscore_lint-0.4.2/src/slopscore/features/syntactic_tells.py +184 -0
  72. slopscore_lint-0.4.2/src/slopscore/ingest/__init__.py +56 -0
  73. slopscore_lint-0.4.2/src/slopscore/ingest/batch.py +21 -0
  74. slopscore_lint-0.4.2/src/slopscore/ingest/json_source.py +37 -0
  75. slopscore_lint-0.4.2/src/slopscore/ingest/markdown.py +69 -0
  76. slopscore_lint-0.4.2/src/slopscore/ingest/text.py +10 -0
  77. slopscore_lint-0.4.2/src/slopscore/ingest/website.py +30 -0
  78. slopscore_lint-0.4.2/src/slopscore/models.py +180 -0
  79. slopscore_lint-0.4.2/src/slopscore/normalize/__init__.py +16 -0
  80. slopscore_lint-0.4.2/src/slopscore/normalize/clean.py +47 -0
  81. slopscore_lint-0.4.2/src/slopscore/normalize/language.py +30 -0
  82. slopscore_lint-0.4.2/src/slopscore/normalize/offsets.py +72 -0
  83. slopscore_lint-0.4.2/src/slopscore/normalize/segment.py +45 -0
  84. slopscore_lint-0.4.2/src/slopscore/report/__init__.py +9 -0
  85. slopscore_lint-0.4.2/src/slopscore/report/baseline.py +56 -0
  86. slopscore_lint-0.4.2/src/slopscore/report/batch.py +91 -0
  87. slopscore_lint-0.4.2/src/slopscore/report/console.py +129 -0
  88. slopscore_lint-0.4.2/src/slopscore/report/html.py +126 -0
  89. slopscore_lint-0.4.2/src/slopscore/report/json_report.py +9 -0
  90. slopscore_lint-0.4.2/src/slopscore/report/locations.py +36 -0
  91. slopscore_lint-0.4.2/src/slopscore/report/markdown.py +58 -0
  92. slopscore_lint-0.4.2/src/slopscore/report/sarif.py +93 -0
  93. slopscore_lint-0.4.2/src/slopscore/scoring/__init__.py +5 -0
  94. slopscore_lint-0.4.2/src/slopscore/scoring/calibrate.py +102 -0
  95. slopscore_lint-0.4.2/src/slopscore/scoring/confidence.py +69 -0
  96. slopscore_lint-0.4.2/src/slopscore/scoring/model.py +103 -0
  97. slopscore_lint-0.4.2/src/slopscore/scoring/profiles.py +53 -0
  98. slopscore_lint-0.4.2/src/slopscore/scoring/scorer.py +168 -0
  99. slopscore_lint-0.4.2/src/slopscore/scoring/weights.py +49 -0
  100. slopscore_lint-0.4.2/src/slopscore/spans.py +17 -0
  101. slopscore_lint-0.4.2/src/slopscore/suppress.py +100 -0
  102. slopscore_lint-0.4.2/tests/conftest.py +73 -0
  103. slopscore_lint-0.4.2/tests/test_baseline.py +39 -0
  104. slopscore_lint-0.4.2/tests/test_calibrate.py +58 -0
  105. slopscore_lint-0.4.2/tests/test_cli.py +175 -0
  106. slopscore_lint-0.4.2/tests/test_config.py +74 -0
  107. slopscore_lint-0.4.2/tests/test_conservatism.py +48 -0
  108. slopscore_lint-0.4.2/tests/test_detectors.py +42 -0
  109. slopscore_lint-0.4.2/tests/test_eval.py +62 -0
  110. slopscore_lint-0.4.2/tests/test_features.py +58 -0
  111. slopscore_lint-0.4.2/tests/test_human_and_formatting.py +43 -0
  112. slopscore_lint-0.4.2/tests/test_ingest_markdown.py +29 -0
  113. slopscore_lint-0.4.2/tests/test_ingest_other.py +38 -0
  114. slopscore_lint-0.4.2/tests/test_leakage.py +29 -0
  115. slopscore_lint-0.4.2/tests/test_locations.py +38 -0
  116. slopscore_lint-0.4.2/tests/test_model.py +73 -0
  117. slopscore_lint-0.4.2/tests/test_normalize_offsets.py +48 -0
  118. slopscore_lint-0.4.2/tests/test_phrase_packs.py +50 -0
  119. slopscore_lint-0.4.2/tests/test_scorer.py +44 -0
  120. slopscore_lint-0.4.2/tests/test_suggestions.py +55 -0
  121. slopscore_lint-0.4.2/tests/test_suppress.py +76 -0
  122. slopscore_lint-0.4.2/tests/test_syntactic_tells.py +80 -0
  123. slopscore_lint-0.4.2/tests/test_unsupported_claims.py +52 -0
  124. slopscore_lint-0.4.2/uv.lock +4145 -0
@@ -0,0 +1,26 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+
8
+ jobs:
9
+ gate:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - uses: actions/checkout@v4
13
+ - name: Install uv
14
+ uses: astral-sh/setup-uv@v5
15
+ with:
16
+ python-version: "3.12"
17
+ - name: Sync (dev deps)
18
+ run: uv sync
19
+ - name: Lint
20
+ run: uv run ruff check .
21
+ - name: Format check
22
+ run: uv run ruff format --check .
23
+ - name: Type check
24
+ run: uv run mypy src
25
+ - name: Tests
26
+ run: uv run pytest -q
@@ -0,0 +1,21 @@
1
+ name: Docs
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ paths: ["docs/**", "mkdocs.yml", ".github/workflows/docs.yml"]
7
+ workflow_dispatch:
8
+
9
+ permissions:
10
+ contents: write
11
+
12
+ jobs:
13
+ deploy:
14
+ runs-on: ubuntu-latest
15
+ steps:
16
+ - uses: actions/checkout@v4
17
+ - uses: actions/setup-python@v5
18
+ with:
19
+ python-version: "3.12"
20
+ - run: pip install mkdocs-material
21
+ - run: mkdocs gh-deploy --force
@@ -0,0 +1,38 @@
1
+ name: Publish
2
+
3
+ on:
4
+ push:
5
+ tags:
6
+ - "v*"
7
+
8
+ jobs:
9
+ build:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - uses: actions/checkout@v4
13
+ - uses: astral-sh/setup-uv@v5
14
+ with:
15
+ python-version: "3.12"
16
+ - name: Build sdist + wheel
17
+ run: uv build
18
+ - uses: actions/upload-artifact@v4
19
+ with:
20
+ name: dist
21
+ path: dist/
22
+
23
+ publish:
24
+ # Trusted publishing (OIDC) — no API tokens. Requires a PyPI trusted publisher configured for
25
+ # this repo + workflow (Settings → Publishing on the PyPI project page).
26
+ needs: build
27
+ runs-on: ubuntu-latest
28
+ environment:
29
+ name: pypi
30
+ url: https://pypi.org/p/slopscore-lint
31
+ permissions:
32
+ id-token: write
33
+ steps:
34
+ - uses: actions/download-artifact@v4
35
+ with:
36
+ name: dist
37
+ path: dist/
38
+ - uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,38 @@
1
+ # Local working docs (specs, references) — keep out of the repo
2
+ *.local.md
3
+
4
+ # Python
5
+ __pycache__/
6
+ *.py[cod]
7
+ *.egg-info/
8
+ .eggs/
9
+ build/
10
+ dist/
11
+ *.so
12
+
13
+ # Virtual env (uv.lock IS committed for reproducible dev installs)
14
+ .venv/
15
+
16
+ # Test / coverage / type-check caches
17
+ .pytest_cache/
18
+ .ruff_cache/
19
+ .mypy_cache/
20
+ .coverage*
21
+ htmlcov/
22
+ coverage.xml
23
+
24
+ # Filesystem-sync duplicate artifacts (e.g. "config_file 2.py")
25
+ * [0-9].*
26
+
27
+ # Editor / OS
28
+ .idea/
29
+ .vscode/
30
+ .DS_Store
31
+
32
+ # Claude Code runtime artifacts
33
+ .claude/scheduled_tasks.lock
34
+ .slopscore/
35
+
36
+ # Filesystem-sync duplicate artifacts (e.g. "file 2.py")
37
+ *[0-9].py
38
+ * [0-9].*
@@ -0,0 +1,13 @@
1
+ repos:
2
+ - repo: https://github.com/astral-sh/ruff-pre-commit
3
+ rev: v0.5.7
4
+ hooks:
5
+ - id: ruff
6
+ args: [--fix]
7
+ - id: ruff-format
8
+ - repo: https://github.com/pre-commit/mirrors-mypy
9
+ rev: v1.10.1
10
+ hooks:
11
+ - id: mypy
12
+ additional_dependencies: [pydantic>=2.7, types-pyyaml]
13
+ files: ^src/
@@ -0,0 +1,10 @@
1
+ - id: slopscore-lint
2
+ name: slopscore-lint (AI-slop pattern linter)
3
+ description: Scan prose for AI-slop writing patterns.
4
+ entry: slopscore-lint scan
5
+ language: python
6
+ types: [text]
7
+ files: '\.(md|markdown|txt|rst)$'
8
+ args: ["--fail-on", "high"]
9
+ pass_filenames: true
10
+ require_serial: false
@@ -0,0 +1,58 @@
1
+ # Changelog
2
+
3
+ All notable changes to slopscore. The PyPI distribution is `slopscore-lint`; the import package
4
+ and the tool are named `slopscore`.
5
+
6
+ ## 0.4.2
7
+
8
+ - Scrubbed the README, docs, and model card of the writing patterns the tool flags (em dashes and
9
+ over-polished verbs), so the published prose passes slopscore itself.
10
+ - Config: reject a bare string for `disabled_rules` / `disabled_dimensions` (previously it iterated
11
+ into per-character entries) with a clear error.
12
+ - CLI: create missing parent directories for `--output` and `baseline -o`; report a friendly error
13
+ on a malformed `--baseline-file` instead of a traceback; skip non-UTF-8 files in a batch with a
14
+ warning rather than aborting the run.
15
+ - Added `CHANGELOG.md` and `SECURITY.md`.
16
+
17
+ ## 0.4.1
18
+
19
+ - Renamed the PyPI distribution and the CLI command to `slopscore-lint` (the name `slopscore` was
20
+ already taken on PyPI). The import package stays `slopscore`.
21
+
22
+ ## 0.4.0
23
+
24
+ - Project config via `slopscore.toml` and `[tool.slopscore]` in `pyproject.toml`, with per-rule and
25
+ per-dimension toggles and severity overrides (`slopscore-lint config`).
26
+ - Inline suppression through `<!-- slopscore-disable ... -->` comments.
27
+ - Findings baseline: `slopscore-lint baseline` plus `scan --baseline-file --fail-on-new` to adopt
28
+ the linter on an existing repo and gate CI on new findings only.
29
+ - Implemented the `unsupported_claims` dimension (universal and inflated claims).
30
+ - Opt-in rewrite suggestions (`--suggest`) with SARIF `fixes`, advisory and never auto-applied.
31
+ - Authorship-adapter interface (`AuthorshipDetector` protocol) behind the `[detectors]` extra. No
32
+ detector is bundled; any result is reported separately and never folded into the score.
33
+ - PyPI trusted-publishing workflow and an mkdocs-material docs site.
34
+
35
+ ## 0.3.0
36
+
37
+ - Transparent learned scorer (`--scorer ml`): a sign-constrained, calibrated logistic regression
38
+ over the 13 dimensions, serialized as auditable JSON and run with pure numpy. The rule scorer
39
+ stays the default under a replace-if-wins gate.
40
+ - Evaluation harness/framework (`slopscore-lint eval`): TPR@FPR, PR-AUC, calibration, and
41
+ per-subgroup false-positive rates. See `MODEL_CARD.md` and `DATA_SOURCES.md`.
42
+
43
+ ## 0.2.1
44
+
45
+ - console/JSON/Markdown/SARIF/HTML reports, recursive and changed-files (`--diff`) batch scanning
46
+ with CI exit codes, a GitHub Action, and a pre-commit hook.
47
+
48
+ ## 0.2.0
49
+
50
+ - Detection expansion grounded in Wikipedia's "Signs of AI writing" guide: significance inflation,
51
+ superficial analysis, weasel attribution, negative parallelism, copula avoidance, formatting
52
+ tells, and a negative human-writing signal. Conservative scoring with a corroboration gate and
53
+ abstention on short or non-English input.
54
+
55
+ ## 0.1.0
56
+
57
+ - Initial release: ingestion (text, Markdown, JSON, websites), offset-preserving normalization, a
58
+ feature registry, and the first dimensions with evidence spans.
@@ -0,0 +1,156 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Commands
6
+
7
+ `uv`-managed, src-layout. Common workflows:
8
+
9
+ ```bash
10
+ uv pip install -e . --no-deps # editable install (see install-stability note below)
11
+ uv run --no-sync slopscore-lint scan FILE # scan a file/URL/'-' (stdin); --format console|json|markdown
12
+ uv run --no-sync pytest # tests + coverage (pytest imports from src via pythonpath)
13
+ uv run --no-sync pytest tests/test_scorer.py::test_report_shape # a single test
14
+ uv run --no-sync ruff check . && uv run --no-sync ruff format --check . # lint + format
15
+ uv run --no-sync mypy src # type check (strict)
16
+ ```
17
+
18
+ **Install stability:** `uv sync` installs the project NON-editable and the rebuild can land in a
19
+ broken namespace state (`slopscore.__file__` becomes `None` → `ModuleNotFoundError`). Do an
20
+ editable install once (`uv pip install -e . --no-deps`) and use `uv run --no-sync` so uv does not
21
+ re-sync and clobber it. pytest is insulated regardless via `pythonpath = ["src"]` in pyproject.
22
+
23
+ Optional features live behind extras: `[web]` (trafilatura), `[nlp]` (spaCy + sentence-transformers),
24
+ `[lang]` (lingua). Default install is lean; `scan <url>` without `[web]` exits 3 with a hint. For
25
+ the spaCy path: `uv pip install spacy && uv run --no-sync python -m spacy download en_core_web_sm`
26
+ (`is_nlp_available()` gates it; syntactic features auto-upgrade when present).
27
+
28
+ ## Architecture (v0.2)
29
+
30
+ Pipeline in `src/slopscore/`: `ingest/` (text, markdown via marko, json via jsonpath-ng, website)
31
+ → `normalize/` (ftfy `clean` + offset-preserving `OffsetMapper`, pysbd `segment`, `language`) →
32
+ `features/` → `scoring/` → `report/`. Orchestrated by `core.py:build_document` then
33
+ `scoring/scorer.py:score_document`; public API (`SlopScorer`, `scan_text/_path/_url`) in `__init__.py`.
34
+
35
+ Reports (v0.2.1): `report/` has console, json, markdown, `sarif.py` (2.1.0, hand-built; severity→
36
+ level), `html.py` (Jinja2 behind the `[report]` extra, highlighted spans), `batch.py` (directory/
37
+ multi-file aggregation), and `locations.py` (char→line/col). `Report.original_text` holds the text
38
+ offsets index into. The `scan` CLI takes multiple targets / a directory, `--recursive`, `--diff
39
+ <ref>`, `--fail-on {none|low|medium|high}` (exit codes 0/1/2/3), and `--format sarif|html`. CI
40
+ distribution: `action.yml` (composite) and `.pre-commit-hooks.yaml`.
41
+
42
+ Key invariants when extending:
43
+ - **Every feature is a `Feature`** (`features/base.py`): `extract(doc, profile) -> FeatureResult`
44
+ with a [0,1] score and `Evidence` spans. Importing `slopscore.features` registers them; add a
45
+ dimension by writing a class + `register()` AND a field in `models.Dimension`/`Dimensions` AND a
46
+ weight in `scoring/weights.py`. The scorer iterates the registry.
47
+ - **Evidence offsets index the original text, not the cleaned text.** Features run on
48
+ `doc.cleaned_text` and MUST build spans via `doc.evidence(...)`, which maps offsets back through
49
+ `OffsetMapper`. The round-trip is enforced across the feature tests — keep it green.
50
+ - **`TextSpan` lives in `spans.py`** (not `document.py`) to avoid a normalize↔document import cycle.
51
+ - **Conservatism is in the scorer, not the features.** `scoring/scorer.py` applies a corroboration
52
+ gate (`WEAK_DIMENSIONS` damped when they fire alone), `human_writing_signals` enters with a
53
+ NEGATIVE weight, and `scoring/confidence.py:abstain_reason` caps the label at "mild" on short/
54
+ non-English input. Don't make individual features "conservative" — let the scorer do it.
55
+ - **Rule data is YAML** under `src/slopscore/data/` (force-included into the wheel). `patterns/` is
56
+ organized into category subdirs loaded by `_ruleset.load_rules_from_directory`; `lexicons/markers.yaml`
57
+ carries `era`/`source` tags. The spaCy path lives behind `features/_nlp.py`.
58
+ - Dimensions: lexical_markers, formulaic_structure, significance_inflation, superficial_analysis,
59
+ weasel_attribution, parallelism, copula_avoidance, genericity, redundancy, cadence_sameness,
60
+ formatting_tells (weak), prompt_residue, human_writing_signals (negative). unsupported_claims has
61
+ no feature yet (contributes 0).
62
+ - **Personal baseline:** `scoring/calibrate.py` builds robust per-dimension stats from a corpus;
63
+ `scan --baseline <name>` attaches z-score deviations. Profiles (`scoring/profiles.py`) are hand-set
64
+ (see `PROFILE_NOTES.md`); citations + fairness caveats live in `MODEL_CARD.md`.
65
+
66
+ Scoring engines (v0.3): `scoring/scorer.py` dispatches on `Settings.scorer` (`Scorer.rules` default
67
+ vs `Scorer.ml`). The ML path (`scoring/model.py`) is a pure-numpy logistic model loaded from
68
+ `data/model/slopscore-v0.3.json` over `FEATURE_ORDER`; sign-constrained (slop dims ≥0, human signal
69
+ ≤0), Platt-calibrated. The corroboration gate is rules-only; abstention applies to both. Train with
70
+ `scripts/eval/train.py` (sklearn+scipy, OOF metrics); evaluate with `slopscore-lint eval` / the
71
+ `slopscore.eval/` package (metrics, fairness, selective, span_metrics). Promotion is gated by
72
+ `eval/harness.py:should_promote` (TPR@1%FPR + no subgroup-FPR regression) — currently rules wins, so
73
+ ML stays opt-in. Eval data: `eval/datasets/seed.jsonl` (committed) + `scripts/eval/fetch.py` (large
74
+ corpora, not committed); licensing in `DATA_SOURCES.md`. **Never train the shipped model on NC data;
75
+ never import sklearn at scan time** (the ML path is numpy-only).
76
+
77
+ Linter maturity (v0.4): `config_file.py` loads `slopscore.toml`/`[tool.slopscore]` via `tomllib`
78
+ (precedence CLI > slopscore.toml > pyproject > defaults; `resolve_settings` merges, `Settings`
79
+ carries `disabled_dimensions/rules`, `rule_severity`, `suggest`). The scorer skips disabled
80
+ dimensions and post-filters evidence for disabled rules, severity overrides, and inline suppression
81
+ (`suppress.py`, HTML-comment grammar). `report/baseline.py` fingerprints findings for
82
+ `scan --baseline-file --fail-on-new`. `unsupported_claims` is now a real `_PhrasePack`
83
+ (`data/patterns/claims/`). Opt-in `--suggest` adds `Evidence.suggestion` + SARIF `fixes`
84
+ (`features/suggestions.py`, `data/patterns/suggestions/`) — advisory, excluded from score/`--fail-on`
85
+ (`SUGGEST_*` skipped in `max_severity`). `detectors/` is an interface-only authorship adapter
86
+ (`AuthorshipDetector` protocol + no-op `ReferenceDetector`); its `DetectorResult` populates a
87
+ SEPARATE `Report.authorship` field with a mandatory caveat, never the score. **Wheel packaging:**
88
+ data files ship via hatchling's default package inclusion — do NOT re-add a `force-include` for
89
+ `data/` (it duplicates paths and breaks `uv build`). PyPI publish is OIDC trusted-publishing on tag
90
+ (`.github/workflows/publish.yml`); docs are mkdocs-material (`.github/workflows/docs.yml`).
91
+
92
+ ## Project state
93
+
94
+ v0.1–v0.4 are implemented and green (ruff/mypy/pytest). The repository also holds two reference
95
+ documents:
96
+
97
+ - `BACKGROUND_INFORMATION.local.md` — the authoritative spec. Defines the product concept,
98
+ what to detect, the scoring model, the planned package layout, dependencies, evaluation
99
+ plan, and a versioned MVP build plan (v0.1 → v1.0). **Read this before writing code or
100
+ proposing structure** — it is the source of truth for design decisions.
101
+ - `AI_WRITING_SLOP_Guide.local.md` — a ~1,650-line catalog of real AI-slop writing examples
102
+ and patterns. Use it as a corpus of concrete patterns/phrases to detect and as raw material
103
+ for test fixtures and the evaluation benchmark.
104
+
105
+ The `.local.md` suffix marks these as local-only working files. Do not assume they ship with
106
+ the package or are public.
107
+
108
+ ## What this project is (and is not)
109
+
110
+ `slopscore` is a transparent **AI-slop pattern detector** — not an AI-authorship detector.
111
+ This distinction is load-bearing and shapes every API/report decision:
112
+
113
+ - It outputs a 0–100 **SlopScore** measuring density of formulaic, generic, low-specificity,
114
+ over-polished, LLM-associated writing patterns — plus per-dimension scores, a separate
115
+ confidence score, and **evidence spans** (exact char offsets that triggered each finding).
116
+ - It must **never** claim "this was written by AI." Any authorship signal (v0.4+ detector
117
+ adapters) is kept in a separate field (`ai_authorship_signal`), never folded into the
118
+ `slop_score`. The rationale (detector brittleness, false positives on non-native English,
119
+ paraphrase evasion) is documented in the spec — preserve that separation.
120
+ - Positioning is "Vale/ruff for AI-slop writing patterns," not "another GPTZero clone."
121
+ Conservative by default: prefer false negatives over false accusations.
122
+
123
+ ## Key design decisions (from the spec)
124
+
125
+ - **Python first**, not Rust. The hard part is NLP feature extraction, calibration, and
126
+ evaluation iteration — not raw speed. Rust only later for speed-critical parsing if needed.
127
+ - **Three separate questions, kept distinct:** authorship likelihood (optional, fragile),
128
+ slop-pattern density (the core score), editorial-quality risk (most useful to writers).
129
+ - **Heavyweight model deps live behind extras** (`[web]`, `[nlp]`, `[detectors]`, `[all]`).
130
+ The default install and the default score must be **rule-based and transparent** — no
131
+ black-box detector in the default path.
132
+ - **Genre profiles** (`blog`, `essay`, `academic`, `marketing`, `technical`, `social`)
133
+ reweight dimensions; default `profile=blog`, `strictness=conservative`. The same feature
134
+ can be legitimate in one genre and slop in another (e.g. "robust" in a technical paper).
135
+ - **Suppress/heavily qualify scores on short text** (<300 words) and low-confidence inputs
136
+ (non-English, heavy quotes/code/tables, uncertain web extraction).
137
+ - **Evaluation from day one.** Credibility depends on shipping a benchmark (human-good,
138
+ raw-LLM, edited-LLM, human-bad) and reporting TPR at fixed low FPR, span-level
139
+ precision/recall, and per-domain false-positive rates — not just AUROC.
140
+
141
+ ## Roadmap (per spec)
142
+
143
+ v0.2: genre profile tuning + `calibrate` (personal baseline from your own corpus), HTML report
144
+ with highlighted spans, batch/recursive scanning. v0.3: trained interpretable model (logistic
145
+ regression / LightGBM over the same features). v0.4: optional authorship-signal detector adapters
146
+ (Binoculars, Fast-DetectGPT) in a separate `ai_authorship_signal` field — never folded into
147
+ `slop_score`. v1.0: GitHub Action, SARIF output, evaluation benchmark, model card, docs site.
148
+ See `BACKGROUND_INFORMATION.local.md` for the full plan and the target JSON schema.
149
+
150
+ ## Writing discipline (applies to this repo specifically)
151
+
152
+ This is a tool that detects AI-slop writing, so its own prose must be exemplary. Scrub all
153
+ READMEs, docs, docstrings, reports, and commit/PR text for the patterns the tool itself flags:
154
+ puffery, AI-vocabulary (delve, crucial, pivotal, robust, seamless, leverage, showcase,
155
+ underscore, tapestry), rule-of-three padding, gratuitous em-dashes, and formulaic scaffolding.
156
+ Prefer specific, concrete, falsifiable wording. Dogfooding: prose here should pass `slopscore`.
@@ -0,0 +1,40 @@
1
+ # Evaluation data sources & licensing
2
+
3
+ slopscore separates **code** (MIT) from **evaluation data** (mixed upstream licenses) from the
4
+ **trained model** (weights licensed to match the most-restrictive *training* source). The shipped
5
+ model is trained only on permissive / CC-BY / CC-BY-SA data, so its weights stay redistributable.
6
+
7
+ ## Committed seed set
8
+
9
+ `eval/datasets/seed.jsonl` (~54 rows) is a small, hand-authored, original corpus across the four
10
+ buckets the spec calls for, built by `scripts/eval/build_seed.py`. It is deliberately diverse to
11
+ limit leakage between the WP:AISIGNS-derived features and the labels. It is enough to exercise the
12
+ full eval + training pipeline and to back the CI fairness guardrails; it is **not** a substitute
13
+ for the large corpora below in a serious evaluation.
14
+
15
+ | bucket | label | what |
16
+ |---|---|---|
17
+ | human_good | 0 | specific, plain, factual prose (incl. a simple/plain-English fairness slice) |
18
+ | raw_llm | 1 | LLM-style slop: puffery, trailing "-ing" analyses, parallelism, AI vocab |
19
+ | edited_llm | 1 | slop with concrete details added (harder positives) |
20
+ | human_bad | 1 | vague human marketing/SEO copy (slop patterns, not AI-generated) |
21
+
22
+ ## Large public corpora (fetched, not committed)
23
+
24
+ Pulled by `scripts/eval/fetch.py` into `~/.cache/slopscore/`; never redistributed.
25
+
26
+ | source | license | use |
27
+ |---|---|---|
28
+ | RAID (Dugan et al., ACL 2024) | permissive (verify upstream) | train + eval; paraphrase-robustness |
29
+ | MAGE (Li et al.) | CC-BY-4.0 | train + eval |
30
+ | Kobak et al. excess-vocabulary / Wikipedia | CC-BY-SA-3.0 | train + eval; real edited/humanized text |
31
+ | HC3 (Guo et al., 2023) | **CC-BY-NC-4.0** | **eval-only** — never used to train the shipped model |
32
+
33
+ ## Rules we follow
34
+
35
+ - The shipped `data/model/slopscore-v0.3.json` is trained **only** on train-eligible (non-NC)
36
+ sources. NC corpora are loaded for measurement only.
37
+ - Splits are domain/era-separated where possible to avoid leakage, since the features themselves
38
+ derive from WP:AISIGNS (see the plan's leakage-guard notes).
39
+ - Fairness is measured per subgroup (plain/simple English, short text) and reported in
40
+ `MODEL_CARD.md`; CI fails if subgroup false-positive rates regress.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 John Hodge
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,109 @@
1
+ # slopscore model card (v0.2)
2
+
3
+ ## What it does
4
+
5
+ slopscore scores text for **AI-slop writing patterns**, formulaic, generic, low-specificity,
6
+ over-polished prose, and returns a 0-100 SlopScore with per-dimension breakdowns and evidence
7
+ spans. It is a transparent rule engine: every point comes from a visible rule with a quotable
8
+ span. It does **not** determine authorship.
9
+
10
+ ## What it is not
11
+
12
+ It is not an AI-authorship detector and must not be used to accuse a writer. Authorship detectors
13
+ are unreliable and biased; slopscore deliberately reports patterns, not provenance.
14
+
15
+ ## Intended use
16
+
17
+ Writers, editors, bloggers, maintainers, and content teams self-checking drafts. Not for
18
+ punitive or disciplinary decisions about people.
19
+
20
+ ## How it scores
21
+
22
+ Rule-based features per dimension → each a [0,1] score → weighted sum → sigmoid → 0-100.
23
+ Conservatism guardrails (v0.2):
24
+
25
+ - **Corroboration gate.** Weak-alone tells (lexical markers, parallelism, copula avoidance,
26
+ formatting) are damped when no other dimension co-fires. A single fancy word or em dash cannot
27
+ by itself reach "severe".
28
+ - **Negative signal.** `human_writing_signals` (plain verbs, superlatives, hedges, concrete
29
+ numbers) lowers the score for specific, plain prose.
30
+ - **Abstention.** On input under ~100 words, or detected non-English, the label is capped at
31
+ "mild" and a reason is reported.
32
+
33
+ ## Detection grounding (sources)
34
+
35
+ Dimensions and the lexicon are drawn from Wikipedia's "Signs of AI writing" (WP:AISIGNS) and the
36
+ research it cites:
37
+
38
+ - Juzek & Ward, "Why Does ChatGPT 'Delve' So Much?" (arXiv:2412.11385), overused vocabulary.
39
+ - Kobak et al., "Delving into LLM-assisted writing…" (Science Advances 2025), excess vocabulary.
40
+ - Reinhart et al., "Do LLMs write like humans?" (PNAS 2025), present-participle / rhetorical style.
41
+ - Geng & Trotta (arXiv:2404.08627), decline of "is/are" copulas in post-2022 writing.
42
+ - Russell et al. (ACL 2025), humans detect AI near chance; expert LLM-users rely on lexical cues.
43
+
44
+ Vocabulary drifts by model era (GPT-4 → GPT-4o → GPT-5); the lexicon tags terms with their era.
45
+
46
+ ## Limitations and fairness
47
+
48
+ - **Non-native English false positives.** Liang et al. (Patterns 2023) found AI detectors flag
49
+ non-native-English (e.g. TOEFL) essays at up to ~61%. slopscore mitigates with the corroboration
50
+ gate, the negative human signal, and abstention, but residual risk remains. Do not treat a
51
+ high score on plain or non-native English as evidence of anything about the author.
52
+ - **Short text.** Under ~300 words confidence is low; under ~100 the score abstains.
53
+ - **Genre.** Marketing and travel writing naturally resemble slop; use `--profile` to reweight.
54
+ - **Adversarial edits.** Light paraphrasing evades pattern matching, as it does all detectors.
55
+ - **Coverage.** Wikipedia/markup-specific and authorship-signal tells are intentionally excluded;
56
+ slopscore is a general-prose tool.
57
+
58
+ ## v0.3: learned scorer and evaluation
59
+
60
+ v0.3 adds an evaluation framework (`slopscore-lint eval`) and a transparent learned scorer: a
61
+ **sign-constrained, Platt-calibrated logistic regression** over the 13 interpretable dimensions
62
+ (slop dimensions weight ≥ 0, `human_writing_signals` ≤ 0). It is serialized as auditable JSON
63
+ (`data/model/slopscore-v0.3.json`) and runs with pure numpy at scan time, `--scorer ml`.
64
+
65
+ **The rule scorer remains the default.** Under the replace-if-wins gate, the learned model must
66
+ both (a) not lose on TPR@1%FPR and (b) not regress any subgroup false-positive rate. On the
67
+ committed seed set it does neither cleanly:
68
+
69
+ | scorer | TPR@1%FPR | PR-AUC | ECE | simple-English FPR |
70
+ |---|---|---|---|---|
71
+ | rules | 0.80 | 0.96 | 0.14 | 0.00 |
72
+ | ml (out-of-fold) | 0.77 | 0.96 | 0.12 | n/a |
73
+ | ml (in-sample, seed) | 0.80 | 0.98 | 0.06 | **0.62** |
74
+
75
+ The learned model improves calibration but **over-flags plain/simple English** (a fairness
76
+ regression on exactly the population detectors are known to harm) and does not beat the rules on
77
+ held-out TPR@1%FPR. So `--scorer ml` is available and opt-in; `rules` stays default. This is the
78
+ gate working as intended, not a failure.
79
+
80
+ Caveats: these numbers are from the small hand-authored seed set (~54 rows; in-sample for ml
81
+ unless noted out-of-fold). They are illustrative, not a serious benchmark, run `slopscore-lint eval`
82
+ on the fetched public corpora (`scripts/eval/fetch.py`, see `DATA_SOURCES.md`) for real figures.
83
+
84
+ ### Real-corpus experiment (MAGE): and why it validates the design
85
+
86
+ Held-out test split of the committed seed + a fetched MAGE subset (CC-BY; ~1,450 rows total,
87
+ 30% test), via `scripts/eval/experiment.py`:
88
+
89
+ | scorer | TPR@1%FPR | TPR@5%FPR | PR-AUC | ECE |
90
+ |---|---|---|---|---|
91
+ | rules | 0.06 | 0.08 | 0.51 | 0.29 |
92
+ | LR (sign-constrained) | 0.10 | 0.11 | 0.52 | 0.03 |
93
+ | LightGBM (monotone, **experiment only**) | 0.09 | 0.13 | **0.75** | 0.02 |
94
+
95
+ **MAGE labels by authorship (machine vs human), not by slop.** That the slop scorers sit near
96
+ chance at low FPR on MAGE is the design working, not failing: slopscore detects slop *patterns*,
97
+ not provenance, so it should *not* cleanly separate well-written machine text from human text.
98
+ The learned variants improve calibration sharply (ECE 0.29 → 0.02-0.03), and LightGBM extracts
99
+ more authorship signal from the same 13 features nonlinearly (PR-AUC 0.75). We **do not ship
100
+ LightGBM**: it needs trees at scan time (breaking the pure-numpy path), and optimizing it against
101
+ authorship labels would turn slopscore into an authorship detector, the one thing it refuses to be. The **shipped model stays the seed-trained, slop-labeled LR**, and
102
+ the **rule scorer stays the default**. The shipped model is never trained on MAGE.
103
+
104
+ ## Changes from v0.1
105
+
106
+ Added significance inflation, superficial "-ing" analyses, vague/over-attribution, negative
107
+ parallelism / rule-of-three, copula avoidance, formatting tells, and a negative human-writing
108
+ signal; expanded the cited lexicon; added the corroboration gate, abstention, and personal-baseline
109
+ calibration. The default install stays lean (regex + scikit-learn); spaCy precision is behind `[nlp]`.