papercheck 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (144) hide show
  1. papercheck-0.3.0/.gitattributes +3 -0
  2. papercheck-0.3.0/.github/workflows/ci.yml +26 -0
  3. papercheck-0.3.0/.github/workflows/example-paper-audit.yml +40 -0
  4. papercheck-0.3.0/.github/workflows/publish.yml +42 -0
  5. papercheck-0.3.0/.gitignore +18 -0
  6. papercheck-0.3.0/CHANGELOG.md +84 -0
  7. papercheck-0.3.0/CITATION.cff +23 -0
  8. papercheck-0.3.0/CONTRIBUTING.md +77 -0
  9. papercheck-0.3.0/LICENSE +21 -0
  10. papercheck-0.3.0/PKG-INFO +177 -0
  11. papercheck-0.3.0/PROGRESS.md +101 -0
  12. papercheck-0.3.0/README.md +152 -0
  13. papercheck-0.3.0/action.yml +59 -0
  14. papercheck-0.3.0/docs/adapting_to_new_papers.md +15 -0
  15. papercheck-0.3.0/docs/agent_eval.md +116 -0
  16. papercheck-0.3.0/docs/assets/demo-placeholder.svg +28 -0
  17. papercheck-0.3.0/docs/assets/logo.png +0 -0
  18. papercheck-0.3.0/docs/assets/logo.svg +47 -0
  19. papercheck-0.3.0/docs/ci.md +145 -0
  20. papercheck-0.3.0/docs/comparison.md +35 -0
  21. papercheck-0.3.0/docs/failure_modes.md +25 -0
  22. papercheck-0.3.0/docs/limitations.md +57 -0
  23. papercheck-0.3.0/docs/pat_principles.md +10 -0
  24. papercheck-0.3.0/docs/privacy.md +36 -0
  25. papercheck-0.3.0/docs/references.bib +13 -0
  26. papercheck-0.3.0/docs/reviewer_roles.md +37 -0
  27. papercheck-0.3.0/docs/severity_scale.md +11 -0
  28. papercheck-0.3.0/docs/workflow.md +20 -0
  29. papercheck-0.3.0/domain_packs/README.md +47 -0
  30. papercheck-0.3.0/domain_packs/general.yaml +18 -0
  31. papercheck-0.3.0/domain_packs/machine_learning.yaml +18 -0
  32. papercheck-0.3.0/domain_packs/numerical_analysis.yaml +17 -0
  33. papercheck-0.3.0/domain_packs/optimization.yaml +18 -0
  34. papercheck-0.3.0/domain_packs/pde.yaml +18 -0
  35. papercheck-0.3.0/domain_packs/stochastic_analysis.yaml +17 -0
  36. papercheck-0.3.0/eval/RESULTS.md +38 -0
  37. papercheck-0.3.0/eval/findings.json +121 -0
  38. papercheck-0.3.0/eval/run_eval.py +117 -0
  39. papercheck-0.3.0/examples/toy_clean_paper/expected_gate.json +3 -0
  40. papercheck-0.3.0/examples/toy_clean_paper/main.tex +36 -0
  41. papercheck-0.3.0/examples/toy_clean_paper/refs.bib +6 -0
  42. papercheck-0.3.0/profiles/profiles.json +74 -0
  43. papercheck-0.3.0/prompts/00_bootstrap_orchestrator.md +18 -0
  44. papercheck-0.3.0/prompts/01_repository_inspector.md +23 -0
  45. papercheck-0.3.0/prompts/02_build_and_source_hygiene.md +22 -0
  46. papercheck-0.3.0/prompts/03_segmenter_and_budgeter.md +36 -0
  47. papercheck-0.3.0/prompts/04_theorem_inventory.md +20 -0
  48. papercheck-0.3.0/prompts/05_assumption_dependency_auditor.md +20 -0
  49. papercheck-0.3.0/prompts/06_equation_indexer.md +16 -0
  50. papercheck-0.3.0/prompts/07_formalist_proof_auditor.md +30 -0
  51. papercheck-0.3.0/prompts/08_domain_specialist_auditor.md +13 -0
  52. papercheck-0.3.0/prompts/09_main_theorem_chain_auditor.md +30 -0
  53. papercheck-0.3.0/prompts/10_numerical_experiments_auditor.md +25 -0
  54. papercheck-0.3.0/prompts/11_related_work_and_novelty_auditor.md +18 -0
  55. papercheck-0.3.0/prompts/12_notation_consistency_auditor.md +21 -0
  56. papercheck-0.3.0/prompts/13_source_hygiene_auditor.md +29 -0
  57. papercheck-0.3.0/prompts/14_global_synthesis.md +25 -0
  58. papercheck-0.3.0/prompts/15_issue_adjudicator.md +38 -0
  59. papercheck-0.3.0/prompts/16_patch_planner.md +22 -0
  60. papercheck-0.3.0/prompts/17_patcher.md +20 -0
  61. papercheck-0.3.0/prompts/18_regression_auditor.md +23 -0
  62. papercheck-0.3.0/prompts/19_final_acceptance_gate.md +28 -0
  63. papercheck-0.3.0/prompts/20_version_comparison_auditor.md +19 -0
  64. papercheck-0.3.0/pyproject.toml +60 -0
  65. papercheck-0.3.0/schemas/.gitkeep +0 -0
  66. papercheck-0.3.0/schemas/domain_pack.schema.json +28 -0
  67. papercheck-0.3.0/schemas/issue.schema.json +121 -0
  68. papercheck-0.3.0/schemas/manual_check.schema.json +29 -0
  69. papercheck-0.3.0/schemas/patch.schema.json +52 -0
  70. papercheck-0.3.0/schemas/segment.schema.json +55 -0
  71. papercheck-0.3.0/schemas/state.schema.json +65 -0
  72. papercheck-0.3.0/scripts/agent_eval_report.py +184 -0
  73. papercheck-0.3.0/scripts/privacy_check.py +65 -0
  74. papercheck-0.3.0/src/papercheck/__init__.py +3 -0
  75. papercheck-0.3.0/src/papercheck/cli/__init__.py +1 -0
  76. papercheck-0.3.0/src/papercheck/cli/main.py +324 -0
  77. papercheck-0.3.0/src/papercheck/core/__init__.py +1 -0
  78. papercheck-0.3.0/src/papercheck/core/_resources.py +38 -0
  79. papercheck-0.3.0/src/papercheck/core/adjudicate.py +228 -0
  80. papercheck-0.3.0/src/papercheck/core/compare.py +258 -0
  81. papercheck-0.3.0/src/papercheck/core/domainpack.py +148 -0
  82. papercheck-0.3.0/src/papercheck/core/gate.py +178 -0
  83. papercheck-0.3.0/src/papercheck/core/html_report.py +295 -0
  84. papercheck-0.3.0/src/papercheck/core/issues.py +86 -0
  85. papercheck-0.3.0/src/papercheck/core/ledger.py +157 -0
  86. papercheck-0.3.0/src/papercheck/core/paths.py +50 -0
  87. papercheck-0.3.0/src/papercheck/core/profiles.py +51 -0
  88. papercheck-0.3.0/src/papercheck/core/render.py +211 -0
  89. papercheck-0.3.0/src/papercheck/core/schemas.py +45 -0
  90. papercheck-0.3.0/src/papercheck/core/segments.py +198 -0
  91. papercheck-0.3.0/src/papercheck/core/state.py +168 -0
  92. papercheck-0.3.0/src/papercheck/core/texscan.py +801 -0
  93. papercheck-0.3.0/src/papercheck/core/verify.py +66 -0
  94. papercheck-0.3.0/src/papercheck/core/webserve.py +612 -0
  95. papercheck-0.3.0/src/papercheck/mcp_server/__init__.py +1 -0
  96. papercheck-0.3.0/src/papercheck/mcp_server/handlers.py +346 -0
  97. papercheck-0.3.0/src/papercheck/mcp_server/server.py +216 -0
  98. papercheck-0.3.0/templates/assumption_record.md +13 -0
  99. papercheck-0.3.0/templates/equation_record.md +11 -0
  100. papercheck-0.3.0/templates/final_gate.md +11 -0
  101. papercheck-0.3.0/templates/issue.md +23 -0
  102. papercheck-0.3.0/templates/patch_record.md +13 -0
  103. papercheck-0.3.0/templates/report_header.md +11 -0
  104. papercheck-0.3.0/templates/segment_record.md +12 -0
  105. papercheck-0.3.0/templates/theorem_record.md +18 -0
  106. papercheck-0.3.0/tests/fixtures/toy_bad_gronwall_constant/expected.json +8 -0
  107. papercheck-0.3.0/tests/fixtures/toy_bad_gronwall_constant/main.tex +43 -0
  108. papercheck-0.3.0/tests/fixtures/toy_bad_label_refs/expected.json +7 -0
  109. papercheck-0.3.0/tests/fixtures/toy_bad_label_refs/main.tex +30 -0
  110. papercheck-0.3.0/tests/fixtures/toy_bad_label_refs/refs.bib +13 -0
  111. papercheck-0.3.0/tests/fixtures/toy_bad_label_refs/sec.tex +10 -0
  112. papercheck-0.3.0/tests/fixtures/toy_clean_paper/expected_gate.json +3 -0
  113. papercheck-0.3.0/tests/fixtures/toy_clean_paper/main.tex +36 -0
  114. papercheck-0.3.0/tests/fixtures/toy_clean_paper/refs.bib +6 -0
  115. papercheck-0.3.0/tests/fixtures/toy_false_positive_trap/expected.json +6 -0
  116. papercheck-0.3.0/tests/fixtures/toy_false_positive_trap/main.tex +45 -0
  117. papercheck-0.3.0/tests/fixtures/toy_missing_assumption/expected.json +9 -0
  118. papercheck-0.3.0/tests/fixtures/toy_missing_assumption/main.tex +42 -0
  119. papercheck-0.3.0/tests/fixtures/toy_overclaimed_abstract/expected.json +9 -0
  120. papercheck-0.3.0/tests/fixtures/toy_overclaimed_abstract/main.tex +39 -0
  121. papercheck-0.3.0/tests/test_agent_eval_replay.py +78 -0
  122. papercheck-0.3.0/tests/test_cli.py +103 -0
  123. papercheck-0.3.0/tests/test_compare.py +88 -0
  124. papercheck-0.3.0/tests/test_domain_packs.py +132 -0
  125. papercheck-0.3.0/tests/test_domainpack_gen.py +80 -0
  126. papercheck-0.3.0/tests/test_fixtures.py +60 -0
  127. papercheck-0.3.0/tests/test_gate.py +85 -0
  128. papercheck-0.3.0/tests/test_handlers.py +167 -0
  129. papercheck-0.3.0/tests/test_html_report.py +90 -0
  130. papercheck-0.3.0/tests/test_issues_intake.py +100 -0
  131. papercheck-0.3.0/tests/test_ledger.py +68 -0
  132. papercheck-0.3.0/tests/test_mcp_server.py +46 -0
  133. papercheck-0.3.0/tests/test_packaging.py +14 -0
  134. papercheck-0.3.0/tests/test_profiles.py +36 -0
  135. papercheck-0.3.0/tests/test_prompts_pack.py +84 -0
  136. papercheck-0.3.0/tests/test_render.py +75 -0
  137. papercheck-0.3.0/tests/test_schemas.py +120 -0
  138. papercheck-0.3.0/tests/test_segments.py +75 -0
  139. papercheck-0.3.0/tests/test_skeleton.py +28 -0
  140. papercheck-0.3.0/tests/test_state.py +81 -0
  141. papercheck-0.3.0/tests/test_texscan.py +96 -0
  142. papercheck-0.3.0/tests/test_texscan_ast.py +182 -0
  143. papercheck-0.3.0/tests/test_verify.py +68 -0
  144. papercheck-0.3.0/tests/test_webserve.py +168 -0
@@ -0,0 +1,3 @@
1
+ * text=auto eol=lf
2
+ *.png binary
3
+ *.pdf binary
@@ -0,0 +1,26 @@
1
+ name: CI
2
+
3
+ on:
4
+ pull_request:
5
+ push:
6
+
7
+ jobs:
8
+ test:
9
+ strategy:
10
+ fail-fast: false
11
+ matrix:
12
+ os: [ubuntu-latest, windows-latest]
13
+ runs-on: ${{ matrix.os }}
14
+ steps:
15
+ - uses: actions/checkout@v4
16
+ - uses: actions/setup-python@v5
17
+ with:
18
+ python-version: "3.12"
19
+ - name: Install
20
+ run: python -m pip install -e ".[dev]"
21
+ - name: Ruff
22
+ run: ruff check .
23
+ - name: Pytest
24
+ run: pytest -q
25
+ - name: Privacy check
26
+ run: python scripts/privacy_check.py
@@ -0,0 +1,40 @@
1
+ name: "Papercheck Mechanical Audit"
2
+
3
+ # Trigger on pull requests and manual workflow runs
4
+ on:
5
+ pull_request:
6
+ workflow_dispatch:
7
+
8
+ jobs:
9
+ mechanical-audit:
10
+ runs-on: ubuntu-latest
11
+ steps:
12
+ - name: Checkout repository
13
+ uses: actions/checkout@v4
14
+
15
+ # USAGE OPTION 1: Use the published GitHub Action from the papercheck repo
16
+ # (requires papercheck to be published and the action to be in the OWNER/papercheck repo)
17
+ # Uncomment the step below and comment out USAGE OPTION 2:
18
+ #
19
+ # - name: Run papercheck mechanical audit
20
+ # uses: OWNER/papercheck@v0.2.0
21
+ # with:
22
+ # paper-root: "."
23
+
24
+ # USAGE OPTION 2: Install and run papercheck directly in the workflow
25
+ # Use this if you haven't published the action to GitHub yet, or for testing.
26
+ # Replace "." with the path to your paper root (e.g., "paper/" if sources are in a subdirectory).
27
+ - name: Run papercheck mechanical audit
28
+ uses: actions/checkout@v4 # This ensures we have the latest code
29
+ - name: Set up Python
30
+ uses: actions/setup-python@v5
31
+ with:
32
+ python-version: "3.12"
33
+ - name: Install papercheck
34
+ run: pip install papercheck
35
+ - name: Run scan
36
+ run: papercheck scan "."
37
+ - name: Run segments
38
+ run: papercheck segments "."
39
+ - name: Run gate check
40
+ run: papercheck gate "." --mechanical-only
@@ -0,0 +1,42 @@
1
+ name: Publish to PyPI
2
+
3
+ # Publishes papercheck to PyPI when a GitHub Release is published, using PyPI
4
+ # Trusted Publishing (OpenID Connect) — no API token or password is stored.
5
+ # The matching trusted publisher must be registered on PyPI:
6
+ # project: papercheck owner: cgarryZA repo: papercheck
7
+ # workflow: publish.yml environment: pypi
8
+
9
+ on:
10
+ release:
11
+ types: [published]
12
+ workflow_dispatch:
13
+
14
+ jobs:
15
+ build-and-publish:
16
+ name: Build and publish to PyPI
17
+ runs-on: ubuntu-latest
18
+ environment:
19
+ name: pypi
20
+ url: https://pypi.org/p/papercheck
21
+ permissions:
22
+ id-token: write # required for OIDC trusted publishing
23
+ contents: read
24
+ steps:
25
+ - uses: actions/checkout@v4
26
+
27
+ - uses: actions/setup-python@v5
28
+ with:
29
+ python-version: "3.12"
30
+
31
+ - name: Build sdist and wheel
32
+ run: |
33
+ python -m pip install --upgrade build
34
+ python -m build
35
+
36
+ - name: Check the built distributions
37
+ run: |
38
+ python -m pip install --upgrade twine
39
+ twine check dist/*
40
+
41
+ - name: Publish to PyPI
42
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,18 @@
1
+ __pycache__/
2
+ *.egg-info/
3
+ .pytest_cache/
4
+ build/
5
+ dist/
6
+ dist_check/
7
+ *.whl
8
+ .ruff_cache/
9
+ Paper_Audit/
10
+
11
+ # LaTeX build products (e.g. generated by the gate's latexmk build in fixtures)
12
+ *.aux
13
+ *.fdb_latexmk
14
+ *.fls
15
+ *.log
16
+ *.pdf
17
+ *.out
18
+ *.synctex.gz
@@ -0,0 +1,84 @@
1
+ # Changelog
2
+
3
+ All notable changes to papercheck are documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.3.0] — 2026-07-02
9
+
10
+ ### Added
11
+
12
+ - **LaTeX-AST scanner** — `texscan.scan` now builds a real LaTeX node tree
13
+ (via `pylatexenc`) instead of regex/line matching, with a graceful regex
14
+ fallback on malformed input. Output schema is unchanged.
15
+ - **Interactive web UI** — `papercheck serve <paper_root>` starts a local
16
+ stdlib HTTP server with a filterable issue table (by status/severity/category)
17
+ and click-to-source; jailed source reader, XSS-safe rendering.
18
+
19
+ ### Fixed
20
+
21
+ - Single-line `\begin{theorem}\label{...}...\end{theorem}` now attributes the
22
+ label to the theorem environment; macro-defined (`\newtheorem`) and nested
23
+ environments are handled.
24
+
25
+ ## [0.2.0] — 2026-07-02
26
+
27
+ ### Added
28
+
29
+ - **HTML audit report** — `papercheck report <paper_root>` writes a
30
+ self-contained HTML report to `Paper_Audit/report/index.html` with verdict
31
+ banner, issue table, segments, and manual checks.
32
+ - **Version-compare command** — `papercheck compare <old_root> <new_root>`
33
+ produces a structural diff of two paper versions (theorems by label,
34
+ abstract, labels, citations, equations) and writes `Paper_Audit/version_comparison.md`.
35
+ - **Audit profiles** — `papercheck profile [list|show <name>]` exposes advisory
36
+ audit profiles (quick, arxiv, full, journal, no-cloud), each with an ordered
37
+ sequence of recommended steps and a mechanical_only flag. Profiles are
38
+ advisory only; papercheck does not execute them automatically.
39
+ - **Domain pack generator and CLI** — `papercheck packs [list|show <name>|scaffold
40
+ --paper-root <root>|create <file.json> --paper-root <root>]` enables building
41
+ paper-specific domain packs. Scaffold deterministically extracts structure;
42
+ create validates and persists a filled-in pack to `Paper_Audit/domain_pack.json`.
43
+ - **Four new MCP tools** — `list_domain_packs`, `get_domain_pack`,
44
+ `scaffold_domain_pack`, `create_domain_pack`. Agents fill in domain knowledge;
45
+ papercheck validates and stores it.
46
+ - **New shipped domain packs** — `pde`, `optimization`, `machine_learning`,
47
+ `general` (fully generic) join existing `stochastic_analysis`, `numerical_analysis`.
48
+ - **End-user GitHub Action** — composite action in `action.yml` + example
49
+ workflow (`.github/workflows/example-paper-audit.yml`) + documentation
50
+ (`docs/ci.md`) for running mechanical-only checks (scan, segments, gate
51
+ --mechanical-only) in CI without sending the manuscript to external services.
52
+ - **29 MCP tools** (up from 25) — all new tools wired to the stdio MCP server.
53
+ - **118 mechanical tests** — all passing; includes 31 new tests for v0.2 features.
54
+
55
+ ## [0.1.0] — 2026-06-01
56
+
57
+ ### Added
58
+
59
+ - **Deterministic LaTeX scanner** — regex-based structure extraction producing
60
+ `structure.json` (theorems, labels, refs, citations, draft markers) with no
61
+ guessing and no network calls.
62
+ - **Stage-gated state machine** — fixed workflow stages (INIT → SCANNED →
63
+ SEGMENTED → INVENTORIED → AUDITING → SYNTHESIZED → ADJUDICATED →
64
+ PATCH_PLANNED → PATCHING → REGRESSED → GATED) enforced in code.
65
+ - **Schema-validated ledgers** — JSON Schemas for issues, patches, segments,
66
+ manual checks, and state; unverifiable findings rejected as
67
+ `REJECTED_SOURCE_TARGET_INVALID` before entering the ledger.
68
+ - **Verified issue intake** — exact quote verification, label/ref/cite/file
69
+ checking, with unverifiable issues rejected mechanically.
70
+ - **Mechanical final gate** — code-enforced verdict (READY / NOT READY) based on
71
+ build status, label/ref/cite counts, draft markers, and open blockers.
72
+ - **Markdown report rendering** — JSON-to-Markdown for all workflow stages
73
+ (e.g., `10_final_acceptance_gate.md`).
74
+ - **Typer CLI** — 7 commands (scan, segments, render, init, submit-issue,
75
+ adjudicate, gate) with full help text.
76
+ - **FastMCP stdio server** — 25 tools exposing the deterministic core to any
77
+ MCP-capable agent (e.g., Claude Code).
78
+ - **Vendored prompt pack** — domain-agnostic auditing prompt in `prompts/`.
79
+ - **Shipped domain packs** — `stochastic_analysis`, `numerical_analysis` in
80
+ `domain_packs/` with categorized theorem types and technique keywords.
81
+ - **Evaluation fixtures** — toy papers (clean, draft-marked, unresolved refs)
82
+ for testing and agent-eval workflows.
83
+ - **CI + packaging** — green `ruff` lint, 87 mechanical tests, privacy check,
84
+ wheel with bundled data dirs (schemas, prompts, templates, domain_packs).
@@ -0,0 +1,23 @@
1
+ cff-version: 1.2.0
2
+ message: "If you use papercheck, please cite it using these metadata."
3
+ title: papercheck
4
+ type: software
5
+ version: 0.3.0
6
+ license: MIT
7
+ abstract: >-
8
+ papercheck is a reproducible audit harness for mathematical LaTeX papers. It
9
+ turns a paper into a stage-gated adversarial audit: deterministic structure
10
+ extraction, JSON-Schema-validated issue ledgers with mechanical quote
11
+ verification, and a code-enforced final gate that any MCP-capable agent can
12
+ drive but cannot skip. It does not itself call an LLM and is not a theorem
13
+ prover or a replacement for peer review.
14
+ authors:
15
+ - name: "papercheck contributors"
16
+ keywords:
17
+ - latex
18
+ - mathematics
19
+ - research-software
20
+ - reproducibility
21
+ - paper-review
22
+ - mcp
23
+ - audit
@@ -0,0 +1,77 @@
1
+ # Contributing to papercheck
2
+
3
+ Thanks for your interest. papercheck is small, deterministic, and deliberately
4
+ scoped. Please read the discipline notes below before opening a PR — they are
5
+ not negotiable for the v0.1 line.
6
+
7
+ ## Setup
8
+
9
+ ```bash
10
+ python -m pip install -e ".[dev]"
11
+ ```
12
+
13
+ This installs papercheck in editable mode with the dev extras (`pytest`,
14
+ `ruff`).
15
+
16
+ ## Run the tests
17
+
18
+ ```bash
19
+ pytest -q
20
+ ```
21
+
22
+ The suite is fully mechanical — no LLM calls, no network. It must stay green on
23
+ both Linux and Windows.
24
+
25
+ ## Lint
26
+
27
+ ```bash
28
+ ruff check .
29
+ ```
30
+
31
+ CI runs the same command; keep it clean.
32
+
33
+ ## Privacy check (required)
34
+
35
+ ```bash
36
+ python scripts/privacy_check.py
37
+ ```
38
+
39
+ This must exit 0. It scans the repository for forbidden identifiers tied to a
40
+ specific unpublished paper. **Never add paper-specific unpublished content** —
41
+ manuscripts, real author names, private quotes, or anything that identifies a
42
+ non-public paper — to this repository. Toy fixtures under `tests/fixtures/` are
43
+ synthetic and self-contained by design; keep them that way.
44
+
45
+ ## Phase / spec discipline (v0.1)
46
+
47
+ The architecture is fixed for the v0.1 line. PRs that change these will be
48
+ declined:
49
+
50
+ - **No LLM provider code.** papercheck never calls a model API. The reasoning
51
+ lives in the driving agent (Claude Code or any MCP client) or in a human. Do
52
+ not add provider adapters, API clients, or keys.
53
+ - **JSON is the source of truth.** Issues, patches, segments, manual checks, and
54
+ state are JSON validated against the schemas in `schemas/`. Markdown reports
55
+ are *rendered from* JSON, never authored directly.
56
+ - **Gates are enforced in code.** Stage ordering and the final gate live in the
57
+ Python core, not in prompts. Do not move enforcement into agent instructions.
58
+
59
+ If you have a change that needs one of these, open an issue to discuss it as a
60
+ future major-version direction first (see the deferred list in `PROGRESS.md`).
61
+
62
+ ## How prompts are tested
63
+
64
+ Prompts and the audit workflow are validated in two tiers:
65
+
66
+ - **Tier 1 — mechanical fixtures.** The pytest suite exercises the deterministic
67
+ core against the toy fixtures in `tests/fixtures/` (e.g. the label/ref fixture
68
+ is mechanically detectable). This runs in CI.
69
+ - **Tier 2 — agent-eval.** A human or MCP agent runs the full audit workflow
70
+ against the *semantic* fixtures and checks that planted defects are found and
71
+ survive adjudication, and that the false-positive trap is correctly rejected.
72
+ This is **not** run in CI (it requires an LLM). See
73
+ [`docs/agent_eval.md`](docs/agent_eval.md) for the exact procedure, and use
74
+ `scripts/agent_eval_report.py` to summarize a completed audit.
75
+
76
+ When you change a prompt, describe in your PR which fixtures you re-ran under
77
+ Tier 2 and what you observed.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 papercheck contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,177 @@
1
+ Metadata-Version: 2.4
2
+ Name: papercheck
3
+ Version: 0.3.0
4
+ Summary: A reproducible audit harness for mathematical LaTeX papers.
5
+ Project-URL: Homepage, https://github.com/OWNER-PLACEHOLDER/papercheck
6
+ Project-URL: Repository, https://github.com/OWNER-PLACEHOLDER/papercheck
7
+ License: MIT
8
+ License-File: LICENSE
9
+ Keywords: audit,latex,mathematics,mcp,paper-review,reproducibility,research-software
10
+ Classifier: Intended Audience :: Science/Research
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
16
+ Requires-Python: >=3.10
17
+ Requires-Dist: jsonschema>=4.0
18
+ Requires-Dist: mcp>=1.0
19
+ Requires-Dist: pylatexenc>=2.10
20
+ Requires-Dist: typer>=0.12
21
+ Provides-Extra: dev
22
+ Requires-Dist: pytest>=8; extra == 'dev'
23
+ Requires-Dist: ruff>=0.6; extra == 'dev'
24
+ Description-Content-Type: text/markdown
25
+
26
+ <div align="center">
27
+
28
+ <img src="docs/assets/logo.png" alt="papercheck" width="680"/>
29
+
30
+ <br/>
31
+
32
+ A reproducible audit harness for mathematical LaTeX papers.
33
+
34
+ <a href="https://github.com/cgarryZA/papercheck/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-22c55e"></a>
35
+ <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.10%2B-3776AB?logo=python&logoColor=white"></a>
36
+ <img src="https://img.shields.io/badge/tests-140_passing-16a34a">
37
+ <img src="https://img.shields.io/badge/version-0.3.0-8b5cf6">
38
+ <a href="https://modelcontextprotocol.io"><img src="https://img.shields.io/badge/MCP-server-ff6d00"></a>
39
+
40
+ </div>
41
+
42
+ papercheck extracts a paper's structure, records findings in schema-validated ledgers that are checked against the exact source text, and runs a final gate that reports whether the paper is ready. It runs as a command-line tool, or as an MCP server that an agent (Claude Code, Codex, Cursor, or similar) drives while you supply the mathematical judgement.
43
+
44
+ The point is discipline. Instead of asking one model to "review the paper," papercheck segments the manuscript, inventories the claims, runs narrow hostile audits, verifies each finding against the source, adjudicates, patches only what was accepted, and gates. The gates are enforced in code: an agent cannot patch before adjudication, and a finding whose quote does not appear in the source never enters the ledger.
45
+
46
+ The staged pipeline — segment, budget, specialist review, synthesize, adjudicate — follows the approach described in Google Research's Paper Assistant Tool (PAT) [[1]](#references), adapted into a small, open, model-agnostic harness that runs no models of its own.
47
+
48
+ ## What it is
49
+
50
+ A deterministic Python core with two frontends — a CLI and an MCP server. papercheck itself never calls a model; the reasoning lives in the agent or in your hands at the CLI. What the core provides:
51
+
52
+ - LaTeX-AST structure extraction (theorems, labels, references, citations, equations, draft markers) into `structure.json`.
53
+ - Schema-validated issue ledgers, each finding tied to a file and line.
54
+ - Quote verification at intake: if a finding's quoted text is not in the source, it is rejected as `REJECTED_SOURCE_TARGET_INVALID` before it reaches the proposed ledger.
55
+ - A stage-gated state machine (`INIT → SCANNED → … → ADJUDICATED → … → GATED`) that refuses out-of-order operations.
56
+ - A CLI, an MCP server (29 tools), and a local web UI.
57
+
58
+ ## Why
59
+
60
+ A single "review my paper" prompt tends to fail in three ways. papercheck handles each in code rather than in prompting:
61
+
62
+ | Failure mode | What usually happens | How papercheck handles it |
63
+ | --- | --- | --- |
64
+ | Hallucinated findings | the model invents a problem that isn't in the text | `submit_issue` matches the quoted text against the source; no match means the finding is rejected before it enters the ledger |
65
+ | Patch before proof | the model rewrites before anyone knows what is real | patches are refused unless the state is at `ADJUDICATED` and the issue is `ACCEPTED` |
66
+ | Skipped gate | "looks good to me" with no audit trail | the final gate is computed from mechanical signals and accepted blockers, and returns one of a fixed set of verdicts |
67
+
68
+ ## Quickstart
69
+
70
+ ```bash
71
+ pipx install papercheck # or: pip install papercheck
72
+
73
+ papercheck scan path/to/paper # extract structure.json
74
+ papercheck segments path/to/paper # propose audit segments and budgets
75
+ papercheck gate path/to/paper --mechanical-only
76
+ papercheck report path/to/paper # self-contained HTML report
77
+ papercheck serve path/to/paper # local web UI
78
+ ```
79
+
80
+ Run it on the bundled example (prints `==== READY ====`):
81
+
82
+ ```bash
83
+ papercheck gate examples/toy_clean_paper --mechanical-only
84
+ ```
85
+
86
+ ## Driving it from an agent (MCP)
87
+
88
+ papercheck ships a FastMCP server exposing 29 tools plus the audit prompt pack. Register it once:
89
+
90
+ ```bash
91
+ claude mcp add papercheck -- papercheck-mcp
92
+ ```
93
+
94
+ Then ask the agent to audit a paper. It walks the workflow through the MCP tools — `init_audit`, `run_scan`, `propose_segments`, `submit_issue`, `adjudicate_issue`, `run_gate` — under the same code-enforced gates as the CLI.
95
+
96
+ <details>
97
+ <summary>Generating a paper-specific domain pack</summary>
98
+
99
+ <br/>
100
+
101
+ papercheck does not call a model, so tailoring the audit to a paper's field is a two-step split: the agent reads the paper and drafts the pack, papercheck validates and stores it.
102
+
103
+ ```bash
104
+ papercheck packs scaffold --paper-root ./paper # deterministic draft from the scan
105
+ papercheck packs create draft.json --paper-root ./paper # validated -> Paper_Audit/domain_pack.json
106
+ ```
107
+
108
+ The same is available through the `scaffold_domain_pack` and `create_domain_pack` MCP tools. Generic packs ship for stochastic analysis, PDE, numerical analysis, optimization, machine-learning theory, and a general fallback.
109
+
110
+ </details>
111
+
112
+ ## How it works
113
+
114
+ ```mermaid
115
+ flowchart LR
116
+ A[INIT] --> B[SCANNED] --> C[SEGMENTED] --> D[INVENTORIED] --> E[AUDITING]
117
+ E --> F[SYNTHESIZED] --> G[ADJUDICATED] --> H[PATCH_PLANNED]
118
+ H --> I[PATCHING] --> J[REGRESSED] --> K[GATED]
119
+
120
+ E -. submit_issue .-> V{quote and label verified?}
121
+ V -- no --> X[REJECTED_SOURCE_TARGET_INVALID]
122
+ V -- yes --> L[PROPOSED ledger]
123
+ style X fill:#7f1d1d,stroke:#ef4444,color:#fff
124
+ style K fill:#14532d,stroke:#22c55e,color:#fff
125
+ ```
126
+
127
+ One deterministic core, two frontends, no LLM calls inside the harness. papercheck stays strictly an MCP server and CLI — no provider adapters, no self-orchestration — so it works with any model and makes no network calls of its own.
128
+
129
+ ## Commands
130
+
131
+ | Command | What it does |
132
+ | --- | --- |
133
+ | `papercheck init` | create the `Paper_Audit/` workspace and state file |
134
+ | `papercheck scan` | LaTeX-AST structure extraction into `structure.json` |
135
+ | `papercheck segments` | heuristic segment and budget proposal |
136
+ | `papercheck gate [--mechanical-only]` | compute the final verdict (exit 0 = READY) |
137
+ | `papercheck verify-quote` | check a quote against a source file |
138
+ | `papercheck report` | self-contained HTML audit report |
139
+ | `papercheck serve` | local web UI (filter issues, click through to source) |
140
+ | `papercheck compare old/ new/` | structural diff of two paper versions |
141
+ | `papercheck profile list\|show` | advisory audit pipelines (`quick`, `arxiv`, `full`, …) |
142
+ | `papercheck packs …` | list, scaffold, or create domain packs |
143
+ | `papercheck prompts list\|show` | the vendored audit prompt pack |
144
+ | `papercheck mcp` | run the MCP server (stdio) |
145
+
146
+ For a paper repository, [`docs/ci.md`](docs/ci.md) describes a mechanical, LLM-free GitHub Action that runs `scan` and `gate --mechanical-only` on every pull request and sends nothing to any model provider.
147
+
148
+ ## Limitations
149
+
150
+ See [`docs/limitations.md`](docs/limitations.md) for the full version.
151
+
152
+ - papercheck is not a theorem prover and not a replacement for peer review.
153
+ - Catching a semantic error depends entirely on the driving model. papercheck keeps the process disciplined and traceable, but AI findings must be checked independently.
154
+ - papercheck makes no network calls. The agent you drive it with may transmit your manuscript to its model provider, so review the relevant data terms before auditing unpublished work ([`docs/privacy.md`](docs/privacy.md)).
155
+
156
+ ## About
157
+
158
+ Built by Christian Garry.
159
+
160
+ <a href="https://www.linkedin.com/in/christian-tt-garry/"><img src="https://img.shields.io/badge/LinkedIn-Christian_Garry-0A66C2?logo=linkedin&logoColor=white"></a>
161
+ <a href="https://christiangarry.com"><img src="https://img.shields.io/badge/Website-christiangarry.com-22c55e"></a>
162
+
163
+ ## Contributing
164
+
165
+ Contributions welcome — see [`CONTRIBUTING.md`](CONTRIBUTING.md). The architecture is fixed for the 0.x line: strictly MCP and CLI, JSON as the source of truth, gates enforced in code. Prompt changes are guarded by the eval fixtures described in [`docs/agent_eval.md`](docs/agent_eval.md).
166
+
167
+ ## References
168
+
169
+ <a name="references"></a>
170
+
171
+ 1. Rajesh Jayaram, Drew Tyler, David Woodruff, Corinna Cortes, Yossi Matias, Vahab Mirrokni, and Vincent Cohen-Addad. *Towards Automating Scientific Review with Google's Paper Assistant Tool.* arXiv:2606.28277, 2026. <https://arxiv.org/abs/2606.28277>
172
+
173
+ BibTeX for the above is in [`docs/references.bib`](docs/references.bib).
174
+
175
+ ## License
176
+
177
+ MIT — see [`LICENSE`](LICENSE). If papercheck is useful in your work, cite it via [`CITATION.cff`](CITATION.cff).
@@ -0,0 +1,101 @@
1
+ # Build progress
2
+
3
+ A concise log of how papercheck v0.1 was built, phase by phase, with the test
4
+ count at the end of each phase.
5
+
6
+ ## Phases
7
+
8
+ - **P0 — Repo bootstrap.** Package skeleton, `pyproject.toml` (hatchling),
9
+ license, canonical `Paper_Audit/` path helpers, first smoke tests.
10
+ *(tests: 4)*
11
+ - **P1 — Schemas + state machine.** JSON Schemas for issue / patch / segment /
12
+ manual-check / state; the stage-gated `AuditState` with `require_at_least`.
13
+ *(tests: 30)*
14
+ - **P2 — Scanner.** Deterministic regex TeX scanner producing `structure.json`
15
+ (theorem envs, labels, refs, citations, draft markers). *(tests: 50, with P3)*
16
+ - **P3 — Verifiers.** Whitespace-normalized quote verifier, label/ref/cite/file
17
+ verification, and issue intake that rejects unverifiable findings as
18
+ `REJECTED_SOURCE_TARGET_INVALID`. *(tests: 50, with P2)*
19
+ - **P4 — Segments + gate + rendering.** Segment proposal, the mechanical final
20
+ gate (build + label/ref/cite + draft-marker + open-blocker signals), and
21
+ JSON→Markdown report rendering (e.g. `10_final_acceptance_gate.md`).
22
+ *(tests: 61)*
23
+ - **P5 — MCP server.** FastMCP stdio server exposing 25 tools as thin wrappers
24
+ over the deterministic core; `papercheck-mcp` entry point. *(tests: 73)*
25
+ - **P6 — Prompt pack + domain pack + fixtures.** Vendored prompt pack
26
+ (`prompts/`), domain packs (`domain_packs/`), and the toy fixtures — including
27
+ the semantic fixtures used for agent-eval. *(tests: 85)*
28
+ - **P7 — Docs + release polish.** README, `docs/` (limitations, comparison,
29
+ privacy, agent-eval), `CITATION.cff`, `CONTRIBUTING.md`, this log, the
30
+ non-LLM `scripts/agent_eval_report.py` helper, and `pyproject.toml` metadata
31
+ (urls / keywords / classifiers). Packaging work brought the wheel data dirs
32
+ and CI matrix to green. *(tests: 87, incl. packaging)*
33
+
34
+ Test-count progression: **P0: 4 → P1: 30 → P2/P3: 50 → P4: 61 → P5: 73 →
35
+ P6: 85 → packaging: 87.**
36
+
37
+ ## Current status
38
+
39
+ - **v0.1 feature-complete.** 87 mechanical tests passing.
40
+ - **Wheel ships the data dirs** (`schemas/`, `prompts/`, `templates/`,
41
+ `domain_packs/`) via hatch force-include.
42
+ - **CI is green** — lint (`ruff`) + `pytest` + privacy check, on Ubuntu and
43
+ Windows.
44
+
45
+ ## v0.2
46
+
47
+ - **HTML report** — `papercheck report <paper_root>` writes a self-contained
48
+ HTML audit report to `Paper_Audit/report/index.html` from JSON artifacts
49
+ (verdict banner, issue table, segments, manual checks).
50
+ - **Version-compare** — `papercheck compare <old_root> <new_root>` produces a
51
+ structural diff showing added/removed/changed theorems by label, abstract
52
+ changes, and label/citation/section diffs; writes to `Paper_Audit/version_comparison.md`.
53
+ - **Audit profiles** — `papercheck profile [list|show <name>]` exposes advisory
54
+ audit profiles (quick, arxiv, full, journal, no-cloud), each defining an
55
+ ordered sequence of recommended steps and a mechanical_only flag. Profiles are
56
+ advisory only; papercheck does not self-orchestrate.
57
+ - **Domain pack generator** — `papercheck packs scaffold --paper-root <root>`
58
+ deterministically drafts a domain pack from a scanned paper's structure;
59
+ `papercheck packs create <file.json> --paper-root <root>` validates and
60
+ persists a filled-in pack to `Paper_Audit/domain_pack.json`. Four new MCP
61
+ tools (`list_domain_packs`, `get_domain_pack`, `scaffold_domain_pack`,
62
+ `create_domain_pack`) enable agents to fill in domain knowledge while
63
+ papercheck validates and stores the result.
64
+ - **New domain packs** — shipped `pde`, `optimization`, `machine_learning`,
65
+ `general` (fully generic) in addition to existing `stochastic_analysis`,
66
+ `numerical_analysis`.
67
+ - **End-user GitHub Action** — composite action + example workflow
68
+ (`.github/workflows/example-paper-audit.yml`) + `docs/ci.md` for running
69
+ mechanical-only checks (scan, segments, gate --mechanical-only) in CI. Never
70
+ sends the manuscript to any external service; papercheck's own dev CI is
71
+ unchanged.
72
+ - **118 mechanical tests passing** (87 in v0.1 + 31 for new features).
73
+
74
+ ## v0.2 — Explicitly excluded by design
75
+
76
+ These are intentionally not built:
77
+
78
+ - **LLM provider adapters** — papercheck never calls a model itself; the agent
79
+ driving the audit supplies the reasoning.
80
+ - **Self-orchestration** — papercheck is an MCP server and CLI, not an
81
+ autonomous agent.
82
+
83
+ ## Explicitly deferred (future or out of scope)
84
+
85
+ - **LLM-in-CI evals** — Tier 2 agent-eval stays manual; CI runs no model.
86
+ - **LaTeX AST parsing** — fine-grained structural analysis via AST traversal.
87
+ - **Web UI** — the CLI and MCP tools form the user surface.
88
+
89
+ ## Deviations from spec
90
+
91
+ Small, deliberate departures worth recording:
92
+
93
+ - **Handler `render` → `render_reports`.** The MCP/handler entry point is named
94
+ `render_reports` (the CLI command stays `render`) to avoid colliding with the
95
+ core `render` module name.
96
+ - **Id-assign before schema-validate in `issues.py`.** `submit_issue` assigns a
97
+ missing/blank `issue_id` *before* JSON-Schema validation, because a blank id
98
+ fails the schema's id pattern; validation then runs on the completed record.
99
+ - **`amsthm` added to the clean fixture.** `tests/fixtures/toy_clean_paper`
100
+ loads `amsthm` so theorem environments are well-formed and the fixture builds
101
+ cleanly under the gate's latexmk build.