methods-mcp 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. methods_mcp-0.1.0/.claude/napkin.md +96 -0
  2. methods_mcp-0.1.0/.claude/settings.local.json +10 -0
  3. methods_mcp-0.1.0/.github/workflows/ci.yml +39 -0
  4. methods_mcp-0.1.0/.gitignore +43 -0
  5. methods_mcp-0.1.0/.python-version +1 -0
  6. methods_mcp-0.1.0/CHANGELOG.md +22 -0
  7. methods_mcp-0.1.0/LICENSE +21 -0
  8. methods_mcp-0.1.0/PKG-INFO +177 -0
  9. methods_mcp-0.1.0/README.md +136 -0
  10. methods_mcp-0.1.0/examples/__init__.py +0 -0
  11. methods_mcp-0.1.0/examples/claude_code_demo.md +54 -0
  12. methods_mcp-0.1.0/examples/reflexive_demo.py +134 -0
  13. methods_mcp-0.1.0/project-thoughts.md +74 -0
  14. methods_mcp-0.1.0/pyproject.toml +108 -0
  15. methods_mcp-0.1.0/src/methods_mcp/__init__.py +3 -0
  16. methods_mcp-0.1.0/src/methods_mcp/extraction/__init__.py +0 -0
  17. methods_mcp-0.1.0/src/methods_mcp/extraction/html.py +116 -0
  18. methods_mcp-0.1.0/src/methods_mcp/extraction/pdf.py +81 -0
  19. methods_mcp-0.1.0/src/methods_mcp/identifiers.py +91 -0
  20. methods_mcp-0.1.0/src/methods_mcp/llm.py +133 -0
  21. methods_mcp-0.1.0/src/methods_mcp/py.typed +0 -0
  22. methods_mcp-0.1.0/src/methods_mcp/schemas.py +167 -0
  23. methods_mcp-0.1.0/src/methods_mcp/server.py +223 -0
  24. methods_mcp-0.1.0/src/methods_mcp/sources/__init__.py +0 -0
  25. methods_mcp-0.1.0/src/methods_mcp/sources/arxiv.py +57 -0
  26. methods_mcp-0.1.0/src/methods_mcp/tools/__init__.py +0 -0
  27. methods_mcp-0.1.0/src/methods_mcp/tools/fetch.py +109 -0
  28. methods_mcp-0.1.0/src/methods_mcp/tools/methods.py +78 -0
  29. methods_mcp-0.1.0/src/methods_mcp/tools/repo.py +349 -0
  30. methods_mcp-0.1.0/src/methods_mcp/tools/summarize.py +60 -0
  31. methods_mcp-0.1.0/tests/__init__.py +0 -0
  32. methods_mcp-0.1.0/tests/conftest.py +39 -0
  33. methods_mcp-0.1.0/tests/fixtures/ar5iv_minimal.html +36 -0
  34. methods_mcp-0.1.0/tests/fixtures/arxiv_2410_01234.xml +20 -0
  35. methods_mcp-0.1.0/tests/fixtures/github_readme.json +6 -0
  36. methods_mcp-0.1.0/tests/fixtures/github_repo_meta.json +10 -0
  37. methods_mcp-0.1.0/tests/fixtures/github_tree.json +17 -0
  38. methods_mcp-0.1.0/tests/test_extraction_html.py +25 -0
  39. methods_mcp-0.1.0/tests/test_fetch_tools.py +52 -0
  40. methods_mcp-0.1.0/tests/test_identifiers.py +52 -0
  41. methods_mcp-0.1.0/tests/test_pdf_extraction.py +28 -0
  42. methods_mcp-0.1.0/tests/test_repo_heuristics.py +119 -0
  43. methods_mcp-0.1.0/tests/test_server_smoke.py +31 -0
  44. methods_mcp-0.1.0/uv.lock +1850 -0
@@ -0,0 +1,96 @@
1
+ # WWSF Challenge Napkin
2
+
3
+ Living doc for the Worldwide AI Science Fellowship (WWSF) Build Challenge — Flynn's tryout for the inaugural AI for Science fellowship (cohort runs Apr 15 → Jul 15, 2026).
4
+
5
+ ---
6
+
7
+ ## The Ask (one-screen)
8
+
9
+ **Challenge:** Build something new in a week using the latest AI tools. Ship a real demo. Share it in public.
10
+
11
+ **Deliverables (submit by email to Michael Raspuzzi):**
12
+ 1. Build-in-public post on socials (X / LinkedIn / YouTube — long or short form, Flynn's choice)
13
+ 2. Public GitHub link (code + docs)
14
+ 3. 2–3 min Loom walking through process and links
15
+
16
+ **What they're grading on:**
17
+ 1. Real demo — does it actually work
18
+ 2. Try new things — reaching into unfamiliar tools / hardware / territory
19
+ 3. Process + narration — how Flynn thinks and explains it
20
+
21
+ **Deadline:** 2026-04-20, 11:59 PT (received Apr 14 ~00:44 PT, so effectively 6.5 days)
22
+ **Time budget:** ≥6 hrs recommended, average fellowship cadence is 8–10 hrs/wk
23
+
24
+ **Response owed:** reply "Challenge accepted." to lock it in.
25
+
26
+ ## Example prompts they floated
27
+ - OpenClaw agent on a Digital Ocean droplet ($12) + dashboard or LabClaw literature review
28
+ - Claude Code designing a robotic protocol (Opentrons Flex sim)
29
+ - Arduino/ESP32 + sensor (temp, pH, turbidity) → MQTT/serial → live dashboard with threshold alerts
30
+ - Fork OpenSCAD/STEP file from Open Labware, parametrically modify, render, publish
31
+
32
+ (These are illustrative, not prescriptive — "figure it out" is the vibe.)
33
+
34
+ ## Flynn's edge — use this
35
+ - **Dr. Flynn Lachendro, PhD in Biomedical Engineering.** This is literally "AI for Science" — lean into the science side, not just the engineering side.
36
+ - Already ships AI products fast (Arbor = research workbench; Lattice = AI research intel dashboard; Layers = pop-sci mobile; Model Collapse = prompt-engineering roguelike).
37
+ - Strongest stack: **Python + FastAPI + SQLAlchemy async + Pydantic + React/Next + TS**. Uses Anthropic / OpenAI / Gemini / OpenRouter regularly.
38
+ - Background in research pipelines (arXiv, Semantic Scholar, paper summarization, ingestion crons) → natural fit for LabClaw-style science agents.
39
+
40
+ ## What "try new things" means here
41
+ To avoid this being "Flynn ships another web app," reach into at least one of:
42
+ - Real lab hardware / simulators (Opentrons Flex sim, OpenClaw, OT-2 protocol API)
43
+ - Physical-world I/O (ESP32 / Arduino / Pi, sensors, MQTT)
44
+ - Agentic lab workflows (LabClaw skills, MCP servers for scientific tooling)
45
+ - Parametric CAD / Open Labware (OpenSCAD, STEP manipulation)
46
+ - World models or VLA (video-language-action) perception
47
+
48
+ ---
49
+
50
+ ## Flynn Preferences (general, durable)
51
+
52
+ **Identity**
53
+ - Dr. Flynn Lachendro — PhD Biomedical Engineering, AI engineer/researcher, Founding Engineer at Nur Opus (Prism). Faculty / OpenKit background. Focus: multi-agent systems, LLM eval, AI safety.
54
+
55
+ **Collaboration style**
56
+ - **Always wait for explicit go-ahead** before git commit/push, PRs, or any irreversible action. "Should I do X?" ≠ "do X."
57
+ - No destructive-first instincts. `rm -rf`, recreating environments, `--no-verify` — only as last resort, after alternatives fail.
58
+ - No legacy wrappers / backwards-compat shims for internal refactors. Update call sites directly.
59
+ - Commit messages: short, single-line. No `Co-Authored-By`, no multi-paragraph bodies.
60
+
61
+ **Explanation style (two modes)**
62
+ - **Flynn style:** super concise, 2 ways to explain, simple analogies, minimal code, get the knack across. Used when Flynn says "explain Flynn style."
63
+ - **Claude style:** full clean written explanation in the response, then call the `say_tts` MCP (default voice) to read it aloud. Used when Flynn says "explain Claude style" or "read out Claude style."
64
+
65
+ **Package / tooling**
66
+ - `uv` for all Python. Avoid venvs.
67
+ - `ruff format .` → `ruff check . --fix` → `mypy <pkg>` → `pytest -v` is the "Linting, Typing and Checking" sequence.
68
+ - Stack defaults: FastAPI + SQLAlchemy 2.0 async + Pydantic + loguru (BE). Next.js + React + TS + Tailwind (web). Expo + RN + Reanimated (mobile). Supabase (auth + Postgres). Vercel (FE) + Railway (BE).
69
+
70
+ **Code snippets**
71
+ - When showing DB models / architectural primitives: full type annotations, both sides of relationships, `__tablename__` included. No pseudocode for real DB models.
72
+
73
+ **Root-markdown refresh**
74
+ - When Flynn says "refresh the root markdowns": review root `.md`s against current code, surgical updates (preserve style/tone), don't create new ones, report what changed.
75
+
76
+ **Granola**
77
+ - "Look through my Granola" → read `~/Library/Application Support/Granola/cache-v6.json`. `cache.state.documents` = meetings, `cache.state.transcripts` = full text keyed by doc ID.
78
+
79
+ ---
80
+
81
+ ## Build Log
82
+ *(fill in as we go — what was tried, what worked, what got cut, links to commits / tweets / Loom)*
83
+
84
+ - 2026-04-14 — Challenge received. Napkin set up. Planning session in progress.
85
+ - 2026-04-14 — Plan locked: build a FastMCP server primitive, demo via Claude Code live, audience = AI-for-science community (cross-domain).
86
+ - 2026-04-14 — Pivot: `paper-mcp` already exists on PyPI (Bhvaik). Paper2Agent (Stanford) does batch methods+repro. **Wedge: lightweight, on-demand, real-time MCP for methods extraction + reproducibility *heuristics*** — no full re-execution required. Package name: **`methods-mcp`**.
87
+
88
+ ## Open Questions / Decisions
89
+ *(TBD after the plan-mode interview)*
90
+
91
+ ## Submission Checklist
92
+ - [ ] Reply "Challenge accepted." to Michael Raspuzzi
93
+ - [ ] GitHub repo public with README / process docs
94
+ - [ ] Build-in-public post drafted + posted (X / LinkedIn / YouTube)
95
+ - [ ] 2–3 min Loom recorded + link ready
96
+ - [ ] Submission email sent to Michael by **2026-04-20 23:59 PT**
@@ -0,0 +1,10 @@
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "WebFetch(domain:labclaw-ai.github.io)",
5
+ "WebFetch(domain:docs.opentrons.com)",
6
+ "WebFetch(domain:gofastmcp.com)",
7
+ "WebFetch(domain:pypi.org)"
8
+ ]
9
+ }
10
+ }
@@ -0,0 +1,39 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+ branches: [main]
8
+
9
+ jobs:
10
+ test:
11
+ runs-on: ubuntu-latest
12
+ strategy:
13
+ matrix:
14
+ python-version: ["3.11", "3.12", "3.13"]
15
+ steps:
16
+ - uses: actions/checkout@v4
17
+
18
+ - name: Install uv
19
+ uses: astral-sh/setup-uv@v3
20
+ with:
21
+ enable-cache: true
22
+
23
+ - name: Set up Python ${{ matrix.python-version }}
24
+ run: uv python install ${{ matrix.python-version }}
25
+
26
+ - name: Install dependencies
27
+ run: uv sync --extra dev --extra agent --python ${{ matrix.python-version }}
28
+
29
+ - name: Ruff format check
30
+ run: uv run ruff format --check .
31
+
32
+ - name: Ruff lint
33
+ run: uv run ruff check .
34
+
35
+ - name: Mypy
36
+ run: uv run mypy src
37
+
38
+ - name: Pytest
39
+ run: uv run pytest tests -v
@@ -0,0 +1,43 @@
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ *.egg-info/
8
+ *.egg
9
+ build/
10
+ dist/
11
+ wheels/
12
+ .venv/
13
+ venv/
14
+ env/
15
+
16
+ # uv
17
+ uv.lock.bak
18
+
19
+ # Testing / coverage
20
+ .pytest_cache/
21
+ .coverage
22
+ htmlcov/
23
+ .tox/
24
+ .nox/
25
+ .mypy_cache/
26
+ .ruff_cache/
27
+
28
+ # IDE
29
+ .vscode/
30
+ .idea/
31
+ *.swp
32
+ *.swo
33
+ .DS_Store
34
+
35
+ # Env / secrets
36
+ .env
37
+ .env.local
38
+ .env.*.local
39
+ *.pem
40
+
41
+ # Project-specific
42
+ fixtures_cache/
43
+ *.local.md
@@ -0,0 +1 @@
1
+ 3.11
@@ -0,0 +1,22 @@
1
+ # Changelog
2
+
3
+ ## 0.1.0 — 2026-04-14
4
+
5
+ Initial release. Built for the Worldwide AI Science Fellowship build challenge.
6
+
7
+ ### Tools shipped
8
+
9
+ - `health` — server liveness / config check
10
+ - `get_paper_metadata` — URL/ID/DOI → canonical metadata
11
+ - `fetch_paper_text` — ar5iv HTML primary, pypdf fallback
12
+ - `extract_methods` — LLM + Pydantic structured methods
13
+ - `find_code_repo` — paper-text → abstract → Papers With Code
14
+ - `assess_repo_reproducibility` — heuristic, no-clone, GitHub REST API
15
+ - `summarize_paper` — three modes (abstract / tldr / exec)
16
+ - `methods_repro_review` — composite tool
17
+
18
+ ### Notes
19
+
20
+ - Default model: `claude-sonnet-4-6` (override with `METHODS_MCP_MODEL` env var).
21
+ - All tool returns are Pydantic v2 models.
22
+ - 29 offline tests via respx mocks, mypy strict, ruff lint clean.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Flynn Lachendro
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,177 @@
1
+ Metadata-Version: 2.4
2
+ Name: methods-mcp
3
+ Version: 0.1.0
4
+ Summary: Lightweight, on-demand MCP server for structured methods extraction and reproducibility heuristics on academic papers.
5
+ Project-URL: Homepage, https://github.com/FlynnLachendro/methods-mcp
6
+ Project-URL: Repository, https://github.com/FlynnLachendro/methods-mcp
7
+ Project-URL: Issues, https://github.com/FlynnLachendro/methods-mcp/issues
8
+ Author-email: Flynn Lachendro <flynnlachendro@hotmail.co.uk>
9
+ Maintainer-email: Flynn Lachendro <flynnlachendro@hotmail.co.uk>
10
+ License-Expression: MIT
11
+ License-File: LICENSE
12
+ Keywords: academic-papers,ai-for-science,anthropic,claude,fastmcp,mcp,model-context-protocol,reproducibility
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: Science/Research
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Topic :: Scientific/Engineering
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Classifier: Typing :: Typed
24
+ Requires-Python: >=3.11
25
+ Requires-Dist: anthropic>=0.40.0
26
+ Requires-Dist: fastmcp>=2.0.0
27
+ Requires-Dist: httpx>=0.27.0
28
+ Requires-Dist: loguru>=0.7.0
29
+ Requires-Dist: pydantic>=2.6.0
30
+ Requires-Dist: pypdf>=4.0.0
31
+ Requires-Dist: selectolax>=0.3.20
32
+ Provides-Extra: agent
33
+ Requires-Dist: claude-agent-sdk>=0.1.0; extra == 'agent'
34
+ Provides-Extra: dev
35
+ Requires-Dist: mypy>=1.10.0; extra == 'dev'
36
+ Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
37
+ Requires-Dist: pytest>=8.0.0; extra == 'dev'
38
+ Requires-Dist: respx>=0.21.0; extra == 'dev'
39
+ Requires-Dist: ruff>=0.6.0; extra == 'dev'
40
+ Description-Content-Type: text/markdown
41
+
42
+ # methods-mcp
43
+
44
+ [![PyPI](https://img.shields.io/pypi/v/methods-mcp.svg)](https://pypi.org/project/methods-mcp/)
45
+ [![Python](https://img.shields.io/pypi/pyversions/methods-mcp.svg)](https://pypi.org/project/methods-mcp/)
46
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
47
+
48
+ > Lightweight, on-demand MCP server for **structured methods extraction** + **reproducibility heuristics** on academic papers. Built for the [Worldwide AI Science Fellowship](https://www.aisciencesummit.com/) build challenge.
49
+
50
+ `methods-mcp` is a small, sharply-scoped [Model Context Protocol](https://modelcontextprotocol.io) server. It gives any AI agent (Claude Code, Claude Desktop, your Agent SDK script, etc.) eight tools that turn an academic paper URL into:
51
+
52
+ - canonical metadata,
53
+ - best-effort full text + section split,
54
+ - a **Pydantic-validated structured methods object** (steps / reagents / equipment / analyses),
55
+ - the paper's associated **code repository** (best-effort discovery),
56
+ - a **no-execution-required reproducibility verdict** for that repo, and
57
+ - a multi-mode summary.
58
+
59
+ The wedge: heavyweight pipelines like [Paper2Agent](https://arxiv.org/abs/2509.06917) (Stanford) take 30 minutes to hours to digest a paper into agent-ready tools. `methods-mcp` is the **agent-callable, on-demand** complement — every tool returns in seconds, no clone, no execution.
60
+
61
+ ---
62
+
63
+ ## Install
64
+
65
+ ```bash
66
+ uv add methods-mcp
67
+ # or, install globally:
68
+ uv tool install methods-mcp
69
+ # or, classic pip:
70
+ pip install methods-mcp
71
+ ```
72
+
73
+ Set your Anthropic API key (used by the LLM-driven extraction tools):
74
+
75
+ ```bash
76
+ export ANTHROPIC_API_KEY=sk-ant-...
77
+ # Optional — raises GitHub REST API rate limit for repro assessment:
78
+ export GITHUB_TOKEN=ghp_...
79
+ ```
80
+
81
+ ## Use it from Claude Code
82
+
83
+ ```
84
+ /mcp add methods-mcp methods-mcp
85
+ ```
86
+
87
+ Then in any Claude Code chat:
88
+
89
+ > Take https://arxiv.org/abs/2509.06917 and run `methods_repro_review`. Summarise what the paper does, the methods steps, and how reproducible the repo looks.
90
+
91
+ See [`examples/claude_code_demo.md`](examples/claude_code_demo.md) for more session prompts.
92
+
93
+ ## Use it from the Claude Agent SDK
94
+
95
+ ```python
96
+ from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
97
+
98
+ options = ClaudeAgentOptions(
99
+ mcp_servers={
100
+ "methods-mcp": {
101
+ "type": "stdio",
102
+ "command": "methods-mcp",
103
+ "args": [],
104
+ }
105
+ },
106
+ allowed_tools=["mcp__methods-mcp__methods_repro_review"],
107
+ )
108
+
109
+ async with ClaudeSDKClient(options=options) as client:
110
+ await client.query(
111
+ "Run methods_repro_review on https://arxiv.org/abs/2509.06917 "
112
+ "and tell me whether the repo looks reproducible."
113
+ )
114
+ async for msg in client.receive_response():
115
+ print(msg)
116
+ ```
117
+
118
+ A complete runnable example lives in [`examples/reflexive_demo.py`](examples/reflexive_demo.py).
119
+
120
+ ## Tools
121
+
122
+ | Tool | What it does |
123
+ |---|---|
124
+ | `health` | Server liveness + config check. |
125
+ | `get_paper_metadata(input_str)` | Resolve URL / arXiv ID / DOI to canonical metadata. arXiv inputs hit the arXiv export API for title/authors/abstract. |
126
+ | `fetch_paper_text(input_str, prefer="auto"\|"html"\|"pdf")` | Full text + section split. Defaults to ar5iv HTML for arXiv papers (cheap, structured), PDF fallback otherwise. |
127
+ | `extract_methods(input_str, model=None)` | LLM-driven, Pydantic-validated structured methods extraction. Returns `{steps, reagents, equipment, analyses, confidence}`. |
128
+ | `find_code_repo(input_str)` | Discover the paper's code repo via paper text → abstract → Papers With Code. |
129
+ | `assess_repo_reproducibility(repo_url, paper_id=None)` | Heuristic, no-clone reproducibility assessment via the GitHub REST API. Weighted signals (README, deps, fixtures, notebooks, figure scripts, recent maintenance, license) → `{verdict, score, recommended_entrypoint}`. |
130
+ | `summarize_paper(input_str, mode="tldr"\|"abstract"\|"exec")` | LLM summary in three depths. |
131
+ | `methods_repro_review(input_str)` | Composite — metadata + methods + repo + repro in one call. |
132
+
133
+ All tools return Pydantic v2 models (validated, JSON-serialisable). See [`src/methods_mcp/schemas.py`](src/methods_mcp/schemas.py) for the full type surface.
134
+
135
+ ## Design notes
136
+
137
+ - **`extract_methods` uses Anthropic tool-use to coerce the model into emitting an instance of the `MethodsStructured` Pydantic schema.** On validation failure we send one repair message with the validation error and try again before raising.
138
+ - **`assess_repo_reproducibility` does not clone or execute anything.** It scores the repo from publicly-readable GitHub metadata + the recursive tree listing. This is the deliberate wedge against batch tools that try to actually rerun the paper.
139
+ - **`fetch_paper_text` prefers ar5iv HTML over PDF parsing for arXiv papers.** Falls back to `pypdf` for non-arXiv inputs.
140
+ - **The default model is `claude-sonnet-4-6`.** Override via `METHODS_MCP_MODEL` env var or per-call `model=` arg.
141
+
142
+ ## Pair with `paper-mcp`
143
+
144
+ For broader paper search / citation graph tooling, run [`paper-mcp` (Bhvaik)](https://pypi.org/project/paper-mcp/) alongside in the same Claude Code session. `paper-mcp` does title-keyed search, full-text fetch, citations, and references; `methods-mcp` adds the structured-methods + reproducibility layer on top. The two were intentionally designed to compose.
145
+
146
+ ## Develop locally
147
+
148
+ ```bash
149
+ git clone https://github.com/FlynnLachendro/methods-mcp
150
+ cd methods-mcp
151
+ uv sync --extra dev --extra agent
152
+
153
+ uv run pytest # 29 tests, offline
154
+ uv run ruff format .
155
+ uv run ruff check . --fix
156
+ uv run mypy src
157
+
158
+ uv run methods-mcp --help
159
+ ```
160
+
161
+ ## Project notes
162
+
163
+ `project-thoughts.md` (in this repo) is a running log of what we tried, what stuck, and what we cut while building this. Honest write-up for the WWSF Loom narration.
164
+
165
+ ## License
166
+
167
+ MIT — see [`LICENSE`](LICENSE).
168
+
169
+ ## Acknowledgements
170
+
171
+ Built for the [Worldwide AI Science Fellowship](https://www.aisciencesummit.com/) inaugural cohort. Thanks to Michael Raspuzzi for the open-ended brief.
172
+
173
+ Built on:
174
+ - [FastMCP 3.x](https://github.com/jlowin/fastmcp) — the MCP server scaffold.
175
+ - [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-python) — the agent loop in the demo.
176
+ - [ar5iv.labs.arxiv.org](https://ar5iv.labs.arxiv.org/) — clean HTML for arXiv papers.
177
+ - [Anthropic Claude](https://platform.claude.com) — the LLM behind structured extraction.
@@ -0,0 +1,136 @@
1
+ # methods-mcp
2
+
3
+ [![PyPI](https://img.shields.io/pypi/v/methods-mcp.svg)](https://pypi.org/project/methods-mcp/)
4
+ [![Python](https://img.shields.io/pypi/pyversions/methods-mcp.svg)](https://pypi.org/project/methods-mcp/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
6
+
7
+ > Lightweight, on-demand MCP server for **structured methods extraction** + **reproducibility heuristics** on academic papers. Built for the [Worldwide AI Science Fellowship](https://www.aisciencesummit.com/) build challenge.
8
+
9
+ `methods-mcp` is a small, sharply-scoped [Model Context Protocol](https://modelcontextprotocol.io) server. It gives any AI agent (Claude Code, Claude Desktop, your Agent SDK script, etc.) eight tools that turn an academic paper URL into:
10
+
11
+ - canonical metadata,
12
+ - best-effort full text + section split,
13
+ - a **Pydantic-validated structured methods object** (steps / reagents / equipment / analyses),
14
+ - the paper's associated **code repository** (best-effort discovery),
15
+ - a **no-execution-required reproducibility verdict** for that repo, and
16
+ - a multi-mode summary.
17
+
18
+ The wedge: heavyweight pipelines like [Paper2Agent](https://arxiv.org/abs/2509.06917) (Stanford) take 30 minutes to hours to digest a paper into agent-ready tools. `methods-mcp` is the **agent-callable, on-demand** complement — every tool returns in seconds, no clone, no execution.
19
+
20
+ ---
21
+
22
+ ## Install
23
+
24
+ ```bash
25
+ uv add methods-mcp
26
+ # or, install globally:
27
+ uv tool install methods-mcp
28
+ # or, classic pip:
29
+ pip install methods-mcp
30
+ ```
31
+
32
+ Set your Anthropic API key (used by the LLM-driven extraction tools):
33
+
34
+ ```bash
35
+ export ANTHROPIC_API_KEY=sk-ant-...
36
+ # Optional — raises GitHub REST API rate limit for repro assessment:
37
+ export GITHUB_TOKEN=ghp_...
38
+ ```
39
+
40
+ ## Use it from Claude Code
41
+
42
+ ```
43
+ /mcp add methods-mcp methods-mcp
44
+ ```
45
+
46
+ Then in any Claude Code chat:
47
+
48
+ > Take https://arxiv.org/abs/2509.06917 and run `methods_repro_review`. Summarise what the paper does, the methods steps, and how reproducible the repo looks.
49
+
50
+ See [`examples/claude_code_demo.md`](examples/claude_code_demo.md) for more session prompts.
51
+
52
+ ## Use it from the Claude Agent SDK
53
+
54
+ ```python
55
+ from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient
56
+
57
+ options = ClaudeAgentOptions(
58
+ mcp_servers={
59
+ "methods-mcp": {
60
+ "type": "stdio",
61
+ "command": "methods-mcp",
62
+ "args": [],
63
+ }
64
+ },
65
+ allowed_tools=["mcp__methods-mcp__methods_repro_review"],
66
+ )
67
+
68
+ async with ClaudeSDKClient(options=options) as client:
69
+ await client.query(
70
+ "Run methods_repro_review on https://arxiv.org/abs/2509.06917 "
71
+ "and tell me whether the repo looks reproducible."
72
+ )
73
+ async for msg in client.receive_response():
74
+ print(msg)
75
+ ```
76
+
77
+ A complete runnable example lives in [`examples/reflexive_demo.py`](examples/reflexive_demo.py).
78
+
79
+ ## Tools
80
+
81
+ | Tool | What it does |
82
+ |---|---|
83
+ | `health` | Server liveness + config check. |
84
+ | `get_paper_metadata(input_str)` | Resolve URL / arXiv ID / DOI to canonical metadata. arXiv inputs hit the arXiv export API for title/authors/abstract. |
85
+ | `fetch_paper_text(input_str, prefer="auto"\|"html"\|"pdf")` | Full text + section split. Defaults to ar5iv HTML for arXiv papers (cheap, structured), PDF fallback otherwise. |
86
+ | `extract_methods(input_str, model=None)` | LLM-driven, Pydantic-validated structured methods extraction. Returns `{steps, reagents, equipment, analyses, confidence}`. |
87
+ | `find_code_repo(input_str)` | Discover the paper's code repo via paper text → abstract → Papers With Code. |
88
+ | `assess_repo_reproducibility(repo_url, paper_id=None)` | Heuristic, no-clone reproducibility assessment via the GitHub REST API. Weighted signals (README, deps, fixtures, notebooks, figure scripts, recent maintenance, license) → `{verdict, score, recommended_entrypoint}`. |
89
+ | `summarize_paper(input_str, mode="tldr"\|"abstract"\|"exec")` | LLM summary in three depths. |
90
+ | `methods_repro_review(input_str)` | Composite — metadata + methods + repo + repro in one call. |
91
+
92
+ All tools return Pydantic v2 models (validated, JSON-serialisable). See [`src/methods_mcp/schemas.py`](src/methods_mcp/schemas.py) for the full type surface.
93
+
94
+ ## Design notes
95
+
96
+ - **`extract_methods` uses Anthropic tool-use to coerce the model into emitting an instance of the `MethodsStructured` Pydantic schema.** On validation failure we send one repair message with the validation error and try again before raising.
97
+ - **`assess_repo_reproducibility` does not clone or execute anything.** It scores the repo from publicly-readable GitHub metadata + the recursive tree listing. This is the deliberate wedge against batch tools that try to actually rerun the paper.
98
+ - **`fetch_paper_text` prefers ar5iv HTML over PDF parsing for arXiv papers.** Falls back to `pypdf` for non-arXiv inputs.
99
+ - **The default model is `claude-sonnet-4-6`.** Override via `METHODS_MCP_MODEL` env var or per-call `model=` arg.
100
+
101
+ ## Pair with `paper-mcp`
102
+
103
+ For broader paper search / citation graph tooling, run [`paper-mcp` (Bhvaik)](https://pypi.org/project/paper-mcp/) alongside in the same Claude Code session. `paper-mcp` does title-keyed search, full-text fetch, citations, and references; `methods-mcp` adds the structured-methods + reproducibility layer on top. The two were intentionally designed to compose.
104
+
105
+ ## Develop locally
106
+
107
+ ```bash
108
+ git clone https://github.com/FlynnLachendro/methods-mcp
109
+ cd methods-mcp
110
+ uv sync --extra dev --extra agent
111
+
112
+ uv run pytest # 29 tests, offline
113
+ uv run ruff format .
114
+ uv run ruff check . --fix
115
+ uv run mypy src
116
+
117
+ uv run methods-mcp --help
118
+ ```
119
+
120
+ ## Project notes
121
+
122
+ `project-thoughts.md` (in this repo) is a running log of what we tried, what stuck, and what we cut while building this. Honest write-up for the WWSF Loom narration.
123
+
124
+ ## License
125
+
126
+ MIT — see [`LICENSE`](LICENSE).
127
+
128
+ ## Acknowledgements
129
+
130
+ Built for the [Worldwide AI Science Fellowship](https://www.aisciencesummit.com/) inaugural cohort. Thanks to Michael Raspuzzi for the open-ended brief.
131
+
132
+ Built on:
133
+ - [FastMCP 3.x](https://github.com/jlowin/fastmcp) — the MCP server scaffold.
134
+ - [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-python) — the agent loop in the demo.
135
+ - [ar5iv.labs.arxiv.org](https://ar5iv.labs.arxiv.org/) — clean HTML for arXiv papers.
136
+ - [Anthropic Claude](https://platform.claude.com) — the LLM behind structured extraction.
File without changes
@@ -0,0 +1,54 @@
1
+ # Using methods-mcp from Claude Code
2
+
3
+ This is the path the WWSF Loom demos: install the server, point Claude Code at it, ask a science question.
4
+
5
+ ## 1. Install
6
+
7
+ ```bash
8
+ uv add methods-mcp # in any uv-managed project
9
+ # or globally:
10
+ uv tool install methods-mcp
11
+ ```
12
+
13
+ ## 2. Tell Claude Code about the server
14
+
15
+ In a Claude Code session:
16
+
17
+ ```
18
+ /mcp add methods-mcp methods-mcp
19
+ ```
20
+
21
+ (Form: `/mcp add <name> <command>`. The second `methods-mcp` is the console-script binary, which lives on your PATH after `uv tool install` or inside any project that ran `uv add methods-mcp` + `uv sync`.)
22
+
23
+ Set your Anthropic key (used for the LLM-driven extraction tools):
24
+
25
+ ```bash
26
+ export ANTHROPIC_API_KEY=sk-ant-...
27
+ # Optional, raises GitHub API rate limit for repro assessment:
28
+ export GITHUB_TOKEN=ghp_...
29
+ ```
30
+
31
+ ## 3. Ask Claude Code a question that uses the tools
32
+
33
+ Try one of these:
34
+
35
+ > Take https://arxiv.org/abs/2509.06917 and run methods_repro_review. Summarise what the paper does, the methods steps, and how reproducible the repo looks.
36
+
37
+ > Compare the reproducibility of the code repos for these three papers: <url1>, <url2>, <url3>. Which has the strongest reproducibility signals?
38
+
39
+ > Pull the structured methods from this protocol paper [URL] and turn it into a checklist I could hand to a colleague.
40
+
41
+ Watch the tool calls fire in the Claude Code panel — that's the demo.
42
+
43
+ ## 4. Tools exposed
44
+
45
+ | Tool | What it does |
46
+ |---|---|
47
+ | `health` | Liveness + config check. |
48
+ | `get_paper_metadata` | Resolve URL/ID/DOI to canonical metadata. |
49
+ | `fetch_paper_text` | Best-effort full text + section split. |
50
+ | `extract_methods` | LLM + Pydantic structured methods extraction. |
51
+ | `find_code_repo` | Discover the paper's GitHub repo. |
52
+ | `assess_repo_reproducibility` | No-clone heuristic repro assessment. |
53
+ | `summarize_paper` | Three modes: abstract / tldr / exec. |
54
+ | `methods_repro_review` | Composite: metadata + methods + repo + repro in one call. |