papercheck 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- papercheck-0.3.0/.gitattributes +3 -0
- papercheck-0.3.0/.github/workflows/ci.yml +26 -0
- papercheck-0.3.0/.github/workflows/example-paper-audit.yml +40 -0
- papercheck-0.3.0/.github/workflows/publish.yml +42 -0
- papercheck-0.3.0/.gitignore +18 -0
- papercheck-0.3.0/CHANGELOG.md +84 -0
- papercheck-0.3.0/CITATION.cff +23 -0
- papercheck-0.3.0/CONTRIBUTING.md +77 -0
- papercheck-0.3.0/LICENSE +21 -0
- papercheck-0.3.0/PKG-INFO +177 -0
- papercheck-0.3.0/PROGRESS.md +101 -0
- papercheck-0.3.0/README.md +152 -0
- papercheck-0.3.0/action.yml +59 -0
- papercheck-0.3.0/docs/adapting_to_new_papers.md +15 -0
- papercheck-0.3.0/docs/agent_eval.md +116 -0
- papercheck-0.3.0/docs/assets/demo-placeholder.svg +28 -0
- papercheck-0.3.0/docs/assets/logo.png +0 -0
- papercheck-0.3.0/docs/assets/logo.svg +47 -0
- papercheck-0.3.0/docs/ci.md +145 -0
- papercheck-0.3.0/docs/comparison.md +35 -0
- papercheck-0.3.0/docs/failure_modes.md +25 -0
- papercheck-0.3.0/docs/limitations.md +57 -0
- papercheck-0.3.0/docs/pat_principles.md +10 -0
- papercheck-0.3.0/docs/privacy.md +36 -0
- papercheck-0.3.0/docs/references.bib +13 -0
- papercheck-0.3.0/docs/reviewer_roles.md +37 -0
- papercheck-0.3.0/docs/severity_scale.md +11 -0
- papercheck-0.3.0/docs/workflow.md +20 -0
- papercheck-0.3.0/domain_packs/README.md +47 -0
- papercheck-0.3.0/domain_packs/general.yaml +18 -0
- papercheck-0.3.0/domain_packs/machine_learning.yaml +18 -0
- papercheck-0.3.0/domain_packs/numerical_analysis.yaml +17 -0
- papercheck-0.3.0/domain_packs/optimization.yaml +18 -0
- papercheck-0.3.0/domain_packs/pde.yaml +18 -0
- papercheck-0.3.0/domain_packs/stochastic_analysis.yaml +17 -0
- papercheck-0.3.0/eval/RESULTS.md +38 -0
- papercheck-0.3.0/eval/findings.json +121 -0
- papercheck-0.3.0/eval/run_eval.py +117 -0
- papercheck-0.3.0/examples/toy_clean_paper/expected_gate.json +3 -0
- papercheck-0.3.0/examples/toy_clean_paper/main.tex +36 -0
- papercheck-0.3.0/examples/toy_clean_paper/refs.bib +6 -0
- papercheck-0.3.0/profiles/profiles.json +74 -0
- papercheck-0.3.0/prompts/00_bootstrap_orchestrator.md +18 -0
- papercheck-0.3.0/prompts/01_repository_inspector.md +23 -0
- papercheck-0.3.0/prompts/02_build_and_source_hygiene.md +22 -0
- papercheck-0.3.0/prompts/03_segmenter_and_budgeter.md +36 -0
- papercheck-0.3.0/prompts/04_theorem_inventory.md +20 -0
- papercheck-0.3.0/prompts/05_assumption_dependency_auditor.md +20 -0
- papercheck-0.3.0/prompts/06_equation_indexer.md +16 -0
- papercheck-0.3.0/prompts/07_formalist_proof_auditor.md +30 -0
- papercheck-0.3.0/prompts/08_domain_specialist_auditor.md +13 -0
- papercheck-0.3.0/prompts/09_main_theorem_chain_auditor.md +30 -0
- papercheck-0.3.0/prompts/10_numerical_experiments_auditor.md +25 -0
- papercheck-0.3.0/prompts/11_related_work_and_novelty_auditor.md +18 -0
- papercheck-0.3.0/prompts/12_notation_consistency_auditor.md +21 -0
- papercheck-0.3.0/prompts/13_source_hygiene_auditor.md +29 -0
- papercheck-0.3.0/prompts/14_global_synthesis.md +25 -0
- papercheck-0.3.0/prompts/15_issue_adjudicator.md +38 -0
- papercheck-0.3.0/prompts/16_patch_planner.md +22 -0
- papercheck-0.3.0/prompts/17_patcher.md +20 -0
- papercheck-0.3.0/prompts/18_regression_auditor.md +23 -0
- papercheck-0.3.0/prompts/19_final_acceptance_gate.md +28 -0
- papercheck-0.3.0/prompts/20_version_comparison_auditor.md +19 -0
- papercheck-0.3.0/pyproject.toml +60 -0
- papercheck-0.3.0/schemas/.gitkeep +0 -0
- papercheck-0.3.0/schemas/domain_pack.schema.json +28 -0
- papercheck-0.3.0/schemas/issue.schema.json +121 -0
- papercheck-0.3.0/schemas/manual_check.schema.json +29 -0
- papercheck-0.3.0/schemas/patch.schema.json +52 -0
- papercheck-0.3.0/schemas/segment.schema.json +55 -0
- papercheck-0.3.0/schemas/state.schema.json +65 -0
- papercheck-0.3.0/scripts/agent_eval_report.py +184 -0
- papercheck-0.3.0/scripts/privacy_check.py +65 -0
- papercheck-0.3.0/src/papercheck/__init__.py +3 -0
- papercheck-0.3.0/src/papercheck/cli/__init__.py +1 -0
- papercheck-0.3.0/src/papercheck/cli/main.py +324 -0
- papercheck-0.3.0/src/papercheck/core/__init__.py +1 -0
- papercheck-0.3.0/src/papercheck/core/_resources.py +38 -0
- papercheck-0.3.0/src/papercheck/core/adjudicate.py +228 -0
- papercheck-0.3.0/src/papercheck/core/compare.py +258 -0
- papercheck-0.3.0/src/papercheck/core/domainpack.py +148 -0
- papercheck-0.3.0/src/papercheck/core/gate.py +178 -0
- papercheck-0.3.0/src/papercheck/core/html_report.py +295 -0
- papercheck-0.3.0/src/papercheck/core/issues.py +86 -0
- papercheck-0.3.0/src/papercheck/core/ledger.py +157 -0
- papercheck-0.3.0/src/papercheck/core/paths.py +50 -0
- papercheck-0.3.0/src/papercheck/core/profiles.py +51 -0
- papercheck-0.3.0/src/papercheck/core/render.py +211 -0
- papercheck-0.3.0/src/papercheck/core/schemas.py +45 -0
- papercheck-0.3.0/src/papercheck/core/segments.py +198 -0
- papercheck-0.3.0/src/papercheck/core/state.py +168 -0
- papercheck-0.3.0/src/papercheck/core/texscan.py +801 -0
- papercheck-0.3.0/src/papercheck/core/verify.py +66 -0
- papercheck-0.3.0/src/papercheck/core/webserve.py +612 -0
- papercheck-0.3.0/src/papercheck/mcp_server/__init__.py +1 -0
- papercheck-0.3.0/src/papercheck/mcp_server/handlers.py +346 -0
- papercheck-0.3.0/src/papercheck/mcp_server/server.py +216 -0
- papercheck-0.3.0/templates/assumption_record.md +13 -0
- papercheck-0.3.0/templates/equation_record.md +11 -0
- papercheck-0.3.0/templates/final_gate.md +11 -0
- papercheck-0.3.0/templates/issue.md +23 -0
- papercheck-0.3.0/templates/patch_record.md +13 -0
- papercheck-0.3.0/templates/report_header.md +11 -0
- papercheck-0.3.0/templates/segment_record.md +12 -0
- papercheck-0.3.0/templates/theorem_record.md +18 -0
- papercheck-0.3.0/tests/fixtures/toy_bad_gronwall_constant/expected.json +8 -0
- papercheck-0.3.0/tests/fixtures/toy_bad_gronwall_constant/main.tex +43 -0
- papercheck-0.3.0/tests/fixtures/toy_bad_label_refs/expected.json +7 -0
- papercheck-0.3.0/tests/fixtures/toy_bad_label_refs/main.tex +30 -0
- papercheck-0.3.0/tests/fixtures/toy_bad_label_refs/refs.bib +13 -0
- papercheck-0.3.0/tests/fixtures/toy_bad_label_refs/sec.tex +10 -0
- papercheck-0.3.0/tests/fixtures/toy_clean_paper/expected_gate.json +3 -0
- papercheck-0.3.0/tests/fixtures/toy_clean_paper/main.tex +36 -0
- papercheck-0.3.0/tests/fixtures/toy_clean_paper/refs.bib +6 -0
- papercheck-0.3.0/tests/fixtures/toy_false_positive_trap/expected.json +6 -0
- papercheck-0.3.0/tests/fixtures/toy_false_positive_trap/main.tex +45 -0
- papercheck-0.3.0/tests/fixtures/toy_missing_assumption/expected.json +9 -0
- papercheck-0.3.0/tests/fixtures/toy_missing_assumption/main.tex +42 -0
- papercheck-0.3.0/tests/fixtures/toy_overclaimed_abstract/expected.json +9 -0
- papercheck-0.3.0/tests/fixtures/toy_overclaimed_abstract/main.tex +39 -0
- papercheck-0.3.0/tests/test_agent_eval_replay.py +78 -0
- papercheck-0.3.0/tests/test_cli.py +103 -0
- papercheck-0.3.0/tests/test_compare.py +88 -0
- papercheck-0.3.0/tests/test_domain_packs.py +132 -0
- papercheck-0.3.0/tests/test_domainpack_gen.py +80 -0
- papercheck-0.3.0/tests/test_fixtures.py +60 -0
- papercheck-0.3.0/tests/test_gate.py +85 -0
- papercheck-0.3.0/tests/test_handlers.py +167 -0
- papercheck-0.3.0/tests/test_html_report.py +90 -0
- papercheck-0.3.0/tests/test_issues_intake.py +100 -0
- papercheck-0.3.0/tests/test_ledger.py +68 -0
- papercheck-0.3.0/tests/test_mcp_server.py +46 -0
- papercheck-0.3.0/tests/test_packaging.py +14 -0
- papercheck-0.3.0/tests/test_profiles.py +36 -0
- papercheck-0.3.0/tests/test_prompts_pack.py +84 -0
- papercheck-0.3.0/tests/test_render.py +75 -0
- papercheck-0.3.0/tests/test_schemas.py +120 -0
- papercheck-0.3.0/tests/test_segments.py +75 -0
- papercheck-0.3.0/tests/test_skeleton.py +28 -0
- papercheck-0.3.0/tests/test_state.py +81 -0
- papercheck-0.3.0/tests/test_texscan.py +96 -0
- papercheck-0.3.0/tests/test_texscan_ast.py +182 -0
- papercheck-0.3.0/tests/test_verify.py +68 -0
- papercheck-0.3.0/tests/test_webserve.py +168 -0
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
pull_request:
|
|
5
|
+
push:
|
|
6
|
+
|
|
7
|
+
jobs:
|
|
8
|
+
test:
|
|
9
|
+
strategy:
|
|
10
|
+
fail-fast: false
|
|
11
|
+
matrix:
|
|
12
|
+
os: [ubuntu-latest, windows-latest]
|
|
13
|
+
runs-on: ${{ matrix.os }}
|
|
14
|
+
steps:
|
|
15
|
+
- uses: actions/checkout@v4
|
|
16
|
+
- uses: actions/setup-python@v5
|
|
17
|
+
with:
|
|
18
|
+
python-version: "3.12"
|
|
19
|
+
- name: Install
|
|
20
|
+
run: python -m pip install -e ".[dev]"
|
|
21
|
+
- name: Ruff
|
|
22
|
+
run: ruff check .
|
|
23
|
+
- name: Pytest
|
|
24
|
+
run: pytest -q
|
|
25
|
+
- name: Privacy check
|
|
26
|
+
run: python scripts/privacy_check.py
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
name: "Papercheck Mechanical Audit"
|
|
2
|
+
|
|
3
|
+
# Trigger on pull requests and manual workflow runs
|
|
4
|
+
on:
|
|
5
|
+
pull_request:
|
|
6
|
+
workflow_dispatch:
|
|
7
|
+
|
|
8
|
+
jobs:
|
|
9
|
+
mechanical-audit:
|
|
10
|
+
runs-on: ubuntu-latest
|
|
11
|
+
steps:
|
|
12
|
+
- name: Checkout repository
|
|
13
|
+
uses: actions/checkout@v4
|
|
14
|
+
|
|
15
|
+
# USAGE OPTION 1: Use the published GitHub Action from the papercheck repo
|
|
16
|
+
# (requires papercheck to be published and the action to be in the OWNER/papercheck repo)
|
|
17
|
+
# Uncomment the step below and comment out USAGE OPTION 2:
|
|
18
|
+
#
|
|
19
|
+
# - name: Run papercheck mechanical audit
|
|
20
|
+
# uses: OWNER/papercheck@v0.2.0
|
|
21
|
+
# with:
|
|
22
|
+
# paper-root: "."
|
|
23
|
+
|
|
24
|
+
# USAGE OPTION 2: Install and run papercheck directly in the workflow
|
|
25
|
+
# Use this if you haven't published the action to GitHub yet, or for testing.
|
|
26
|
+
# Replace "." with the path to your paper root (e.g., "paper/" if sources are in a subdirectory).
|
|
27
|
+
- name: Run papercheck mechanical audit
|
|
28
|
+
uses: actions/checkout@v4 # This ensures we have the latest code
|
|
29
|
+
- name: Set up Python
|
|
30
|
+
uses: actions/setup-python@v5
|
|
31
|
+
with:
|
|
32
|
+
python-version: "3.12"
|
|
33
|
+
- name: Install papercheck
|
|
34
|
+
run: pip install papercheck
|
|
35
|
+
- name: Run scan
|
|
36
|
+
run: papercheck scan "."
|
|
37
|
+
- name: Run segments
|
|
38
|
+
run: papercheck segments "."
|
|
39
|
+
- name: Run gate check
|
|
40
|
+
run: papercheck gate "." --mechanical-only
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
# Publishes papercheck to PyPI when a GitHub Release is published, using PyPI
|
|
4
|
+
# Trusted Publishing (OpenID Connect) — no API token or password is stored.
|
|
5
|
+
# The matching trusted publisher must be registered on PyPI:
|
|
6
|
+
# project: papercheck owner: cgarryZA repo: papercheck
|
|
7
|
+
# workflow: publish.yml environment: pypi
|
|
8
|
+
|
|
9
|
+
on:
|
|
10
|
+
release:
|
|
11
|
+
types: [published]
|
|
12
|
+
workflow_dispatch:
|
|
13
|
+
|
|
14
|
+
jobs:
|
|
15
|
+
build-and-publish:
|
|
16
|
+
name: Build and publish to PyPI
|
|
17
|
+
runs-on: ubuntu-latest
|
|
18
|
+
environment:
|
|
19
|
+
name: pypi
|
|
20
|
+
url: https://pypi.org/p/papercheck
|
|
21
|
+
permissions:
|
|
22
|
+
id-token: write # required for OIDC trusted publishing
|
|
23
|
+
contents: read
|
|
24
|
+
steps:
|
|
25
|
+
- uses: actions/checkout@v4
|
|
26
|
+
|
|
27
|
+
- uses: actions/setup-python@v5
|
|
28
|
+
with:
|
|
29
|
+
python-version: "3.12"
|
|
30
|
+
|
|
31
|
+
- name: Build sdist and wheel
|
|
32
|
+
run: |
|
|
33
|
+
python -m pip install --upgrade build
|
|
34
|
+
python -m build
|
|
35
|
+
|
|
36
|
+
- name: Check the built distributions
|
|
37
|
+
run: |
|
|
38
|
+
python -m pip install --upgrade twine
|
|
39
|
+
twine check dist/*
|
|
40
|
+
|
|
41
|
+
- name: Publish to PyPI
|
|
42
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
__pycache__/
|
|
2
|
+
*.egg-info/
|
|
3
|
+
.pytest_cache/
|
|
4
|
+
build/
|
|
5
|
+
dist/
|
|
6
|
+
dist_check/
|
|
7
|
+
*.whl
|
|
8
|
+
.ruff_cache/
|
|
9
|
+
Paper_Audit/
|
|
10
|
+
|
|
11
|
+
# LaTeX build products (e.g. generated by the gate's latexmk build in fixtures)
|
|
12
|
+
*.aux
|
|
13
|
+
*.fdb_latexmk
|
|
14
|
+
*.fls
|
|
15
|
+
*.log
|
|
16
|
+
*.pdf
|
|
17
|
+
*.out
|
|
18
|
+
*.synctex.gz
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to papercheck are documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [0.3.0] — 2026-07-02
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
|
|
12
|
+
- **LaTeX-AST scanner** — `texscan.scan` now builds a real LaTeX node tree
|
|
13
|
+
(via `pylatexenc`) instead of regex/line matching, with a graceful regex
|
|
14
|
+
fallback on malformed input. Output schema is unchanged.
|
|
15
|
+
- **Interactive web UI** — `papercheck serve <paper_root>` starts a local
|
|
16
|
+
stdlib HTTP server with a filterable issue table (by status/severity/category)
|
|
17
|
+
and click-to-source; jailed source reader, XSS-safe rendering.
|
|
18
|
+
|
|
19
|
+
### Fixed
|
|
20
|
+
|
|
21
|
+
- Single-line `\begin{theorem}\label{...}...\end{theorem}` now attributes the
|
|
22
|
+
label to the theorem environment; macro-defined (`\newtheorem`) and nested
|
|
23
|
+
environments are handled.
|
|
24
|
+
|
|
25
|
+
## [0.2.0] — 2026-07-02
|
|
26
|
+
|
|
27
|
+
### Added
|
|
28
|
+
|
|
29
|
+
- **HTML audit report** — `papercheck report <paper_root>` writes a
|
|
30
|
+
self-contained HTML report to `Paper_Audit/report/index.html` with verdict
|
|
31
|
+
banner, issue table, segments, and manual checks.
|
|
32
|
+
- **Version-compare command** — `papercheck compare <old_root> <new_root>`
|
|
33
|
+
produces a structural diff of two paper versions (theorems by label,
|
|
34
|
+
abstract, labels, citations, equations) and writes `Paper_Audit/version_comparison.md`.
|
|
35
|
+
- **Audit profiles** — `papercheck profile [list|show <name>]` exposes advisory
|
|
36
|
+
audit profiles (quick, arxiv, full, journal, no-cloud), each with an ordered
|
|
37
|
+
sequence of recommended steps and a mechanical_only flag. Profiles are
|
|
38
|
+
advisory only; papercheck does not execute them automatically.
|
|
39
|
+
- **Domain pack generator and CLI** — `papercheck packs [list|show <name>|scaffold
|
|
40
|
+
--paper-root <root>|create <file.json> --paper-root <root>]` enables building
|
|
41
|
+
paper-specific domain packs. Scaffold deterministically extracts structure;
|
|
42
|
+
create validates and persists a filled-in pack to `Paper_Audit/domain_pack.json`.
|
|
43
|
+
- **Four new MCP tools** — `list_domain_packs`, `get_domain_pack`,
|
|
44
|
+
`scaffold_domain_pack`, `create_domain_pack`. Agents fill in domain knowledge;
|
|
45
|
+
papercheck validates and stores it.
|
|
46
|
+
- **New shipped domain packs** — `pde`, `optimization`, `machine_learning`,
|
|
47
|
+
`general` (fully generic) join existing `stochastic_analysis`, `numerical_analysis`.
|
|
48
|
+
- **End-user GitHub Action** — composite action in `action.yml` + example
|
|
49
|
+
workflow (`.github/workflows/example-paper-audit.yml`) + documentation
|
|
50
|
+
(`docs/ci.md`) for running mechanical-only checks (scan, segments, gate
|
|
51
|
+
--mechanical-only) in CI without sending the manuscript to external services.
|
|
52
|
+
- **29 MCP tools** (up from 25) — all new tools wired to the stdio MCP server.
|
|
53
|
+
- **118 mechanical tests** — all passing; includes 31 new tests for v0.2 features.
|
|
54
|
+
|
|
55
|
+
## [0.1.0] — 2026-06-01
|
|
56
|
+
|
|
57
|
+
### Added
|
|
58
|
+
|
|
59
|
+
- **Deterministic LaTeX scanner** — regex-based structure extraction producing
|
|
60
|
+
`structure.json` (theorems, labels, refs, citations, draft markers) with no
|
|
61
|
+
guessing and no network calls.
|
|
62
|
+
- **Stage-gated state machine** — fixed workflow stages (INIT → SCANNED →
|
|
63
|
+
SEGMENTED → INVENTORIED → AUDITING → SYNTHESIZED → ADJUDICATED →
|
|
64
|
+
PATCH_PLANNED → PATCHING → REGRESSED → GATED) enforced in code.
|
|
65
|
+
- **Schema-validated ledgers** — JSON Schemas for issues, patches, segments,
|
|
66
|
+
manual checks, and state; unverifiable findings rejected as
|
|
67
|
+
`REJECTED_SOURCE_TARGET_INVALID` before entering the ledger.
|
|
68
|
+
- **Verified issue intake** — exact quote verification, label/ref/cite/file
|
|
69
|
+
checking, with unverifiable issues rejected mechanically.
|
|
70
|
+
- **Mechanical final gate** — code-enforced verdict (READY / NOT READY) based on
|
|
71
|
+
build status, label/ref/cite counts, draft markers, and open blockers.
|
|
72
|
+
- **Markdown report rendering** — JSON-to-Markdown for all workflow stages
|
|
73
|
+
(e.g., `10_final_acceptance_gate.md`).
|
|
74
|
+
- **Typer CLI** — 7 commands (scan, segments, render, init, submit-issue,
|
|
75
|
+
adjudicate, gate) with full help text.
|
|
76
|
+
- **FastMCP stdio server** — 25 tools exposing the deterministic core to any
|
|
77
|
+
MCP-capable agent (e.g., Claude Code).
|
|
78
|
+
- **Vendored prompt pack** — domain-agnostic auditing prompt in `prompts/`.
|
|
79
|
+
- **Shipped domain packs** — `stochastic_analysis`, `numerical_analysis` in
|
|
80
|
+
`domain_packs/` with categorized theorem types and technique keywords.
|
|
81
|
+
- **Evaluation fixtures** — toy papers (clean, draft-marked, unresolved refs)
|
|
82
|
+
for testing and agent-eval workflows.
|
|
83
|
+
- **CI + packaging** — green `ruff` lint, 87 mechanical tests, privacy check,
|
|
84
|
+
wheel with bundled data dirs (schemas, prompts, templates, domain_packs).
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
cff-version: 1.2.0
|
|
2
|
+
message: "If you use papercheck, please cite it using these metadata."
|
|
3
|
+
title: papercheck
|
|
4
|
+
type: software
|
|
5
|
+
version: 0.3.0
|
|
6
|
+
license: MIT
|
|
7
|
+
abstract: >-
|
|
8
|
+
papercheck is a reproducible audit harness for mathematical LaTeX papers. It
|
|
9
|
+
turns a paper into a stage-gated adversarial audit: deterministic structure
|
|
10
|
+
extraction, JSON-Schema-validated issue ledgers with mechanical quote
|
|
11
|
+
verification, and a code-enforced final gate that any MCP-capable agent can
|
|
12
|
+
drive but cannot skip. It does not itself call an LLM and is not a theorem
|
|
13
|
+
prover or a replacement for peer review.
|
|
14
|
+
authors:
|
|
15
|
+
- name: "papercheck contributors"
|
|
16
|
+
keywords:
|
|
17
|
+
- latex
|
|
18
|
+
- mathematics
|
|
19
|
+
- research-software
|
|
20
|
+
- reproducibility
|
|
21
|
+
- paper-review
|
|
22
|
+
- mcp
|
|
23
|
+
- audit
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# Contributing to papercheck
|
|
2
|
+
|
|
3
|
+
Thanks for your interest. papercheck is small, deterministic, and deliberately
|
|
4
|
+
scoped. Please read the discipline notes below before opening a PR — they are
|
|
5
|
+
not negotiable for the v0.1 line.
|
|
6
|
+
|
|
7
|
+
## Setup
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
python -m pip install -e ".[dev]"
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
This installs papercheck in editable mode with the dev extras (`pytest`,
|
|
14
|
+
`ruff`).
|
|
15
|
+
|
|
16
|
+
## Run the tests
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
pytest -q
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
The suite is fully mechanical — no LLM calls, no network. It must stay green on
|
|
23
|
+
both Linux and Windows.
|
|
24
|
+
|
|
25
|
+
## Lint
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
ruff check .
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
CI runs the same command; keep it clean.
|
|
32
|
+
|
|
33
|
+
## Privacy check (required)
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
python scripts/privacy_check.py
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
This must exit 0. It scans the repository for forbidden identifiers tied to a
|
|
40
|
+
specific unpublished paper. **Never add paper-specific unpublished content** —
|
|
41
|
+
manuscripts, real author names, private quotes, or anything that identifies a
|
|
42
|
+
non-public paper — to this repository. Toy fixtures under `tests/fixtures/` are
|
|
43
|
+
synthetic and self-contained by design; keep them that way.
|
|
44
|
+
|
|
45
|
+
## Phase / spec discipline (v0.1)
|
|
46
|
+
|
|
47
|
+
The architecture is fixed for the v0.1 line. PRs that change these will be
|
|
48
|
+
declined:
|
|
49
|
+
|
|
50
|
+
- **No LLM provider code.** papercheck never calls a model API. The reasoning
|
|
51
|
+
lives in the driving agent (Claude Code or any MCP client) or in a human. Do
|
|
52
|
+
not add provider adapters, API clients, or keys.
|
|
53
|
+
- **JSON is the source of truth.** Issues, patches, segments, manual checks, and
|
|
54
|
+
state are JSON validated against the schemas in `schemas/`. Markdown reports
|
|
55
|
+
are *rendered from* JSON, never authored directly.
|
|
56
|
+
- **Gates are enforced in code.** Stage ordering and the final gate live in the
|
|
57
|
+
Python core, not in prompts. Do not move enforcement into agent instructions.
|
|
58
|
+
|
|
59
|
+
If you have a change that needs one of these, open an issue to discuss it as a
|
|
60
|
+
future major-version direction first (see the deferred list in `PROGRESS.md`).
|
|
61
|
+
|
|
62
|
+
## How prompts are tested
|
|
63
|
+
|
|
64
|
+
Prompts and the audit workflow are validated in two tiers:
|
|
65
|
+
|
|
66
|
+
- **Tier 1 — mechanical fixtures.** The pytest suite exercises the deterministic
|
|
67
|
+
core against the toy fixtures in `tests/fixtures/` (e.g. the label/ref fixture
|
|
68
|
+
is mechanically detectable). This runs in CI.
|
|
69
|
+
- **Tier 2 — agent-eval.** A human or MCP agent runs the full audit workflow
|
|
70
|
+
against the *semantic* fixtures and checks that planted defects are found and
|
|
71
|
+
survive adjudication, and that the false-positive trap is correctly rejected.
|
|
72
|
+
This is **not** run in CI (it requires an LLM). See
|
|
73
|
+
[`docs/agent_eval.md`](docs/agent_eval.md) for the exact procedure, and use
|
|
74
|
+
`scripts/agent_eval_report.py` to summarize a completed audit.
|
|
75
|
+
|
|
76
|
+
When you change a prompt, describe in your PR which fixtures you re-ran under
|
|
77
|
+
Tier 2 and what you observed.
|
papercheck-0.3.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 papercheck contributors
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,177 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: papercheck
|
|
3
|
+
Version: 0.3.0
|
|
4
|
+
Summary: A reproducible audit harness for mathematical LaTeX papers.
|
|
5
|
+
Project-URL: Homepage, https://github.com/OWNER-PLACEHOLDER/papercheck
|
|
6
|
+
Project-URL: Repository, https://github.com/OWNER-PLACEHOLDER/papercheck
|
|
7
|
+
License: MIT
|
|
8
|
+
License-File: LICENSE
|
|
9
|
+
Keywords: audit,latex,mathematics,mcp,paper-review,reproducibility,research-software
|
|
10
|
+
Classifier: Intended Audience :: Science/Research
|
|
11
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Topic :: Scientific/Engineering :: Mathematics
|
|
16
|
+
Requires-Python: >=3.10
|
|
17
|
+
Requires-Dist: jsonschema>=4.0
|
|
18
|
+
Requires-Dist: mcp>=1.0
|
|
19
|
+
Requires-Dist: pylatexenc>=2.10
|
|
20
|
+
Requires-Dist: typer>=0.12
|
|
21
|
+
Provides-Extra: dev
|
|
22
|
+
Requires-Dist: pytest>=8; extra == 'dev'
|
|
23
|
+
Requires-Dist: ruff>=0.6; extra == 'dev'
|
|
24
|
+
Description-Content-Type: text/markdown
|
|
25
|
+
|
|
26
|
+
<div align="center">
|
|
27
|
+
|
|
28
|
+
<img src="docs/assets/logo.png" alt="papercheck" width="680"/>
|
|
29
|
+
|
|
30
|
+
<br/>
|
|
31
|
+
|
|
32
|
+
A reproducible audit harness for mathematical LaTeX papers.
|
|
33
|
+
|
|
34
|
+
<a href="https://github.com/cgarryZA/papercheck/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-22c55e"></a>
|
|
35
|
+
<a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.10%2B-3776AB?logo=python&logoColor=white"></a>
|
|
36
|
+
<img src="https://img.shields.io/badge/tests-140_passing-16a34a">
|
|
37
|
+
<img src="https://img.shields.io/badge/version-0.3.0-8b5cf6">
|
|
38
|
+
<a href="https://modelcontextprotocol.io"><img src="https://img.shields.io/badge/MCP-server-ff6d00"></a>
|
|
39
|
+
|
|
40
|
+
</div>
|
|
41
|
+
|
|
42
|
+
papercheck extracts a paper's structure, records findings in schema-validated ledgers that are checked against the exact source text, and runs a final gate that reports whether the paper is ready. It runs as a command-line tool, or as an MCP server that an agent (Claude Code, Codex, Cursor, or similar) drives while you supply the mathematical judgement.
|
|
43
|
+
|
|
44
|
+
The point is discipline. Instead of asking one model to "review the paper," papercheck segments the manuscript, inventories the claims, runs narrow hostile audits, verifies each finding against the source, adjudicates, patches only what was accepted, and gates. The gates are enforced in code: an agent cannot patch before adjudication, and a finding whose quote does not appear in the source never enters the ledger.
|
|
45
|
+
|
|
46
|
+
The staged pipeline — segment, budget, specialist review, synthesize, adjudicate — follows the approach described in Google Research's Paper Assistant Tool (PAT) [[1]](#references), adapted into a small, open, model-agnostic harness that runs no models of its own.
|
|
47
|
+
|
|
48
|
+
## What it is
|
|
49
|
+
|
|
50
|
+
A deterministic Python core with two frontends — a CLI and an MCP server. papercheck itself never calls a model; the reasoning lives in the agent or in your hands at the CLI. What the core provides:
|
|
51
|
+
|
|
52
|
+
- LaTeX-AST structure extraction (theorems, labels, references, citations, equations, draft markers) into `structure.json`.
|
|
53
|
+
- Schema-validated issue ledgers, each finding tied to a file and line.
|
|
54
|
+
- Quote verification at intake: if a finding's quoted text is not in the source, it is rejected as `REJECTED_SOURCE_TARGET_INVALID` before it reaches the proposed ledger.
|
|
55
|
+
- A stage-gated state machine (`INIT → SCANNED → … → ADJUDICATED → … → GATED`) that refuses out-of-order operations.
|
|
56
|
+
- A CLI, an MCP server (29 tools), and a local web UI.
|
|
57
|
+
|
|
58
|
+
## Why
|
|
59
|
+
|
|
60
|
+
A single "review my paper" prompt tends to fail in three ways. papercheck handles each in code rather than in prompting:
|
|
61
|
+
|
|
62
|
+
| Failure mode | What usually happens | How papercheck handles it |
|
|
63
|
+
| --- | --- | --- |
|
|
64
|
+
| Hallucinated findings | the model invents a problem that isn't in the text | `submit_issue` matches the quoted text against the source; no match means the finding is rejected before it enters the ledger |
|
|
65
|
+
| Patch before proof | the model rewrites before anyone knows what is real | patches are refused unless the state is at `ADJUDICATED` and the issue is `ACCEPTED` |
|
|
66
|
+
| Skipped gate | "looks good to me" with no audit trail | the final gate is computed from mechanical signals and accepted blockers, and returns one of a fixed set of verdicts |
|
|
67
|
+
|
|
68
|
+
## Quickstart
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
pipx install papercheck # or: pip install papercheck
|
|
72
|
+
|
|
73
|
+
papercheck scan path/to/paper # extract structure.json
|
|
74
|
+
papercheck segments path/to/paper # propose audit segments and budgets
|
|
75
|
+
papercheck gate path/to/paper --mechanical-only
|
|
76
|
+
papercheck report path/to/paper # self-contained HTML report
|
|
77
|
+
papercheck serve path/to/paper # local web UI
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Run it on the bundled example (prints `==== READY ====`):
|
|
81
|
+
|
|
82
|
+
```bash
|
|
83
|
+
papercheck gate examples/toy_clean_paper --mechanical-only
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Driving it from an agent (MCP)
|
|
87
|
+
|
|
88
|
+
papercheck ships a FastMCP server exposing 29 tools plus the audit prompt pack. Register it once:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
claude mcp add papercheck -- papercheck-mcp
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
Then ask the agent to audit a paper. It walks the workflow through the MCP tools — `init_audit`, `run_scan`, `propose_segments`, `submit_issue`, `adjudicate_issue`, `run_gate` — under the same code-enforced gates as the CLI.
|
|
95
|
+
|
|
96
|
+
<details>
|
|
97
|
+
<summary>Generating a paper-specific domain pack</summary>
|
|
98
|
+
|
|
99
|
+
<br/>
|
|
100
|
+
|
|
101
|
+
papercheck does not call a model, so tailoring the audit to a paper's field is a two-step split: the agent reads the paper and drafts the pack, papercheck validates and stores it.
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
papercheck packs scaffold --paper-root ./paper # deterministic draft from the scan
|
|
105
|
+
papercheck packs create draft.json --paper-root ./paper # validated -> Paper_Audit/domain_pack.json
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
The same is available through the `scaffold_domain_pack` and `create_domain_pack` MCP tools. Generic packs ship for stochastic analysis, PDE, numerical analysis, optimization, machine-learning theory, and a general fallback.
|
|
109
|
+
|
|
110
|
+
</details>
|
|
111
|
+
|
|
112
|
+
## How it works
|
|
113
|
+
|
|
114
|
+
```mermaid
|
|
115
|
+
flowchart LR
|
|
116
|
+
A[INIT] --> B[SCANNED] --> C[SEGMENTED] --> D[INVENTORIED] --> E[AUDITING]
|
|
117
|
+
E --> F[SYNTHESIZED] --> G[ADJUDICATED] --> H[PATCH_PLANNED]
|
|
118
|
+
H --> I[PATCHING] --> J[REGRESSED] --> K[GATED]
|
|
119
|
+
|
|
120
|
+
E -. submit_issue .-> V{quote and label verified?}
|
|
121
|
+
V -- no --> X[REJECTED_SOURCE_TARGET_INVALID]
|
|
122
|
+
V -- yes --> L[PROPOSED ledger]
|
|
123
|
+
style X fill:#7f1d1d,stroke:#ef4444,color:#fff
|
|
124
|
+
style K fill:#14532d,stroke:#22c55e,color:#fff
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
One deterministic core, two frontends, no LLM calls inside the harness. papercheck stays strictly an MCP server and CLI — no provider adapters, no self-orchestration — so it works with any model and makes no network calls of its own.
|
|
128
|
+
|
|
129
|
+
## Commands
|
|
130
|
+
|
|
131
|
+
| Command | What it does |
|
|
132
|
+
| --- | --- |
|
|
133
|
+
| `papercheck init` | create the `Paper_Audit/` workspace and state file |
|
|
134
|
+
| `papercheck scan` | LaTeX-AST structure extraction into `structure.json` |
|
|
135
|
+
| `papercheck segments` | heuristic segment and budget proposal |
|
|
136
|
+
| `papercheck gate [--mechanical-only]` | compute the final verdict (exit 0 = READY) |
|
|
137
|
+
| `papercheck verify-quote` | check a quote against a source file |
|
|
138
|
+
| `papercheck report` | self-contained HTML audit report |
|
|
139
|
+
| `papercheck serve` | local web UI (filter issues, click through to source) |
|
|
140
|
+
| `papercheck compare old/ new/` | structural diff of two paper versions |
|
|
141
|
+
| `papercheck profile list\|show` | advisory audit pipelines (`quick`, `arxiv`, `full`, …) |
|
|
142
|
+
| `papercheck packs …` | list, scaffold, or create domain packs |
|
|
143
|
+
| `papercheck prompts list\|show` | the vendored audit prompt pack |
|
|
144
|
+
| `papercheck mcp` | run the MCP server (stdio) |
|
|
145
|
+
|
|
146
|
+
For a paper repository, [`docs/ci.md`](docs/ci.md) describes a mechanical, LLM-free GitHub Action that runs `scan` and `gate --mechanical-only` on every pull request and sends nothing to any model provider.
|
|
147
|
+
|
|
148
|
+
## Limitations
|
|
149
|
+
|
|
150
|
+
See [`docs/limitations.md`](docs/limitations.md) for the full version.
|
|
151
|
+
|
|
152
|
+
- papercheck is not a theorem prover and not a replacement for peer review.
|
|
153
|
+
- Catching a semantic error depends entirely on the driving model. papercheck keeps the process disciplined and traceable, but AI findings must be checked independently.
|
|
154
|
+
- papercheck makes no network calls. The agent you drive it with may transmit your manuscript to its model provider, so review the relevant data terms before auditing unpublished work ([`docs/privacy.md`](docs/privacy.md)).
|
|
155
|
+
|
|
156
|
+
## About
|
|
157
|
+
|
|
158
|
+
Built by Christian Garry.
|
|
159
|
+
|
|
160
|
+
<a href="https://www.linkedin.com/in/christian-tt-garry/"><img src="https://img.shields.io/badge/LinkedIn-Christian_Garry-0A66C2?logo=linkedin&logoColor=white"></a>
|
|
161
|
+
<a href="https://christiangarry.com"><img src="https://img.shields.io/badge/Website-christiangarry.com-22c55e"></a>
|
|
162
|
+
|
|
163
|
+
## Contributing
|
|
164
|
+
|
|
165
|
+
Contributions welcome — see [`CONTRIBUTING.md`](CONTRIBUTING.md). The architecture is fixed for the 0.x line: strictly MCP and CLI, JSON as the source of truth, gates enforced in code. Prompt changes are guarded by the eval fixtures described in [`docs/agent_eval.md`](docs/agent_eval.md).
|
|
166
|
+
|
|
167
|
+
## References
|
|
168
|
+
|
|
169
|
+
<a name="references"></a>
|
|
170
|
+
|
|
171
|
+
1. Rajesh Jayaram, Drew Tyler, David Woodruff, Corinna Cortes, Yossi Matias, Vahab Mirrokni, and Vincent Cohen-Addad. *Towards Automating Scientific Review with Google's Paper Assistant Tool.* arXiv:2606.28277, 2026. <https://arxiv.org/abs/2606.28277>
|
|
172
|
+
|
|
173
|
+
BibTeX for the above is in [`docs/references.bib`](docs/references.bib).
|
|
174
|
+
|
|
175
|
+
## License
|
|
176
|
+
|
|
177
|
+
MIT — see [`LICENSE`](LICENSE). If papercheck is useful in your work, cite it via [`CITATION.cff`](CITATION.cff).
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Build progress
|
|
2
|
+
|
|
3
|
+
A concise log of how papercheck v0.1 was built, phase by phase, with the test
|
|
4
|
+
count at the end of each phase.
|
|
5
|
+
|
|
6
|
+
## Phases
|
|
7
|
+
|
|
8
|
+
- **P0 — Repo bootstrap.** Package skeleton, `pyproject.toml` (hatchling),
|
|
9
|
+
license, canonical `Paper_Audit/` path helpers, first smoke tests.
|
|
10
|
+
*(tests: 4)*
|
|
11
|
+
- **P1 — Schemas + state machine.** JSON Schemas for issue / patch / segment /
|
|
12
|
+
manual-check / state; the stage-gated `AuditState` with `require_at_least`.
|
|
13
|
+
*(tests: 30)*
|
|
14
|
+
- **P2 — Scanner.** Deterministic regex TeX scanner producing `structure.json`
|
|
15
|
+
(theorem envs, labels, refs, citations, draft markers). *(tests: 50, with P3)*
|
|
16
|
+
- **P3 — Verifiers.** Whitespace-normalized quote verifier, label/ref/cite/file
|
|
17
|
+
verification, and issue intake that rejects unverifiable findings as
|
|
18
|
+
`REJECTED_SOURCE_TARGET_INVALID`. *(tests: 50, with P2)*
|
|
19
|
+
- **P4 — Segments + gate + rendering.** Segment proposal, the mechanical final
|
|
20
|
+
gate (build + label/ref/cite + draft-marker + open-blocker signals), and
|
|
21
|
+
JSON→Markdown report rendering (e.g. `10_final_acceptance_gate.md`).
|
|
22
|
+
*(tests: 61)*
|
|
23
|
+
- **P5 — MCP server.** FastMCP stdio server exposing 25 tools as thin wrappers
|
|
24
|
+
over the deterministic core; `papercheck-mcp` entry point. *(tests: 73)*
|
|
25
|
+
- **P6 — Prompt pack + domain pack + fixtures.** Vendored prompt pack
|
|
26
|
+
(`prompts/`), domain packs (`domain_packs/`), and the toy fixtures — including
|
|
27
|
+
the semantic fixtures used for agent-eval. *(tests: 85)*
|
|
28
|
+
- **P7 — Docs + release polish.** README, `docs/` (limitations, comparison,
|
|
29
|
+
privacy, agent-eval), `CITATION.cff`, `CONTRIBUTING.md`, this log, the
|
|
30
|
+
non-LLM `scripts/agent_eval_report.py` helper, and `pyproject.toml` metadata
|
|
31
|
+
(urls / keywords / classifiers). Packaging work brought the wheel data dirs
|
|
32
|
+
and CI matrix to green. *(tests: 87, incl. packaging)*
|
|
33
|
+
|
|
34
|
+
Test-count progression: **P0: 4 → P1: 30 → P2/P3: 50 → P4: 61 → P5: 73 →
|
|
35
|
+
P6: 85 → packaging: 87.**
|
|
36
|
+
|
|
37
|
+
## Current status
|
|
38
|
+
|
|
39
|
+
- **v0.1 feature-complete.** 87 mechanical tests passing.
|
|
40
|
+
- **Wheel ships the data dirs** (`schemas/`, `prompts/`, `templates/`,
|
|
41
|
+
`domain_packs/`) via hatch force-include.
|
|
42
|
+
- **CI is green** — lint (`ruff`) + `pytest` + privacy check, on Ubuntu and
|
|
43
|
+
Windows.
|
|
44
|
+
|
|
45
|
+
## v0.2
|
|
46
|
+
|
|
47
|
+
- **HTML report** — `papercheck report <paper_root>` writes a self-contained
|
|
48
|
+
HTML audit report to `Paper_Audit/report/index.html` from JSON artifacts
|
|
49
|
+
(verdict banner, issue table, segments, manual checks).
|
|
50
|
+
- **Version-compare** — `papercheck compare <old_root> <new_root>` produces a
|
|
51
|
+
structural diff showing added/removed/changed theorems by label, abstract
|
|
52
|
+
changes, and label/citation/section diffs; writes to `Paper_Audit/version_comparison.md`.
|
|
53
|
+
- **Audit profiles** — `papercheck profile [list|show <name>]` exposes advisory
|
|
54
|
+
audit profiles (quick, arxiv, full, journal, no-cloud), each defining an
|
|
55
|
+
ordered sequence of recommended steps and a mechanical_only flag. Profiles are
|
|
56
|
+
advisory only; papercheck does not self-orchestrate.
|
|
57
|
+
- **Domain pack generator** — `papercheck packs scaffold --paper-root <root>`
|
|
58
|
+
deterministically drafts a domain pack from a scanned paper's structure;
|
|
59
|
+
`papercheck packs create <file.json> --paper-root <root>` validates and
|
|
60
|
+
persists a filled-in pack to `Paper_Audit/domain_pack.json`. Four new MCP
|
|
61
|
+
tools (`list_domain_packs`, `get_domain_pack`, `scaffold_domain_pack`,
|
|
62
|
+
`create_domain_pack`) enable agents to fill in domain knowledge while
|
|
63
|
+
papercheck validates and stores the result.
|
|
64
|
+
- **New domain packs** — shipped `pde`, `optimization`, `machine_learning`,
|
|
65
|
+
`general` (fully generic) in addition to existing `stochastic_analysis`,
|
|
66
|
+
`numerical_analysis`.
|
|
67
|
+
- **End-user GitHub Action** — composite action + example workflow
|
|
68
|
+
(`.github/workflows/example-paper-audit.yml`) + `docs/ci.md` for running
|
|
69
|
+
mechanical-only checks (scan, segments, gate --mechanical-only) in CI. Never
|
|
70
|
+
sends the manuscript to any external service; papercheck's own dev CI is
|
|
71
|
+
unchanged.
|
|
72
|
+
- **118 mechanical tests passing** (87 in v0.1 + 31 for new features).
|
|
73
|
+
|
|
74
|
+
## v0.2 — Explicitly excluded by design
|
|
75
|
+
|
|
76
|
+
These are intentionally not built:
|
|
77
|
+
|
|
78
|
+
- **LLM provider adapters** — papercheck never calls a model itself; the agent
|
|
79
|
+
driving the audit supplies the reasoning.
|
|
80
|
+
- **Self-orchestration** — papercheck is an MCP server and CLI, not an
|
|
81
|
+
autonomous agent.
|
|
82
|
+
|
|
83
|
+
## Explicitly deferred (future or out of scope)
|
|
84
|
+
|
|
85
|
+
- **LLM-in-CI evals** — Tier 2 agent-eval stays manual; CI runs no model.
|
|
86
|
+
- **LaTeX AST parsing** — fine-grained structural analysis via AST traversal.
|
|
87
|
+
- **Web UI** — the CLI and MCP tools form the user surface.
|
|
88
|
+
|
|
89
|
+
## Deviations from spec
|
|
90
|
+
|
|
91
|
+
Small, deliberate departures worth recording:
|
|
92
|
+
|
|
93
|
+
- **Handler `render` → `render_reports`.** The MCP/handler entry point is named
|
|
94
|
+
`render_reports` (the CLI command stays `render`) to avoid colliding with the
|
|
95
|
+
core `render` module name.
|
|
96
|
+
- **Id-assign before schema-validate in `issues.py`.** `submit_issue` assigns a
|
|
97
|
+
missing/blank `issue_id` *before* JSON-Schema validation, because a blank id
|
|
98
|
+
fails the schema's id pattern; validation then runs on the completed record.
|
|
99
|
+
- **`amsthm` added to the clean fixture.** `tests/fixtures/toy_clean_paper`
|
|
100
|
+
loads `amsthm` so theorem environments are well-formed and the fixture builds
|
|
101
|
+
cleanly under the gate's latexmk build.
|