paperpipe 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
AGENT_INTEGRATION.md ADDED
@@ -0,0 +1,84 @@
1
+ # Agent Integration Snippet (PaperPipe)
2
+
3
+ Add this section to your project's agent instructions file:
4
+ - Preferred: `AGENTS.md`
5
+ - Also works: `CLAUDE.md`, `GEMINI.md`, or your agent’s equivalent
6
+
7
+ ---
8
+
9
+ ## Paper References (PaperPipe)
10
+
11
+ This project implements methods from scientific papers. Papers are managed via `papi` (paperpipe).
12
+
13
+ ### Paper Database Location
14
+
15
+ Default database root is `~/.paperpipe/`, but it may be overridden (e.g. via `PAPER_DB_PATH`).
16
+ Prefer discovering the active location with:
17
+
18
+ ```bash
19
+ papi path
20
+ ```
21
+
22
+ Per-paper files live at: `<paper_db>/papers/{paper}/`
23
+
24
+ - `meta.json` — metadata + tags
25
+ - `summary.md` — coding-context overview
26
+ - `equations.md` — key equations + explanations (best for implementation verification)
27
+ - `source.tex` — full LaTeX (if available)
28
+ - `paper.pdf` — PDF (used by PaperQA2)
29
+
30
+ ### When to Use What
31
+
32
+ | Task | Best source |
33
+ |------|-------------|
34
+ | “Does my code match the paper?” | Read `{paper}/equations.md` (and/or `{paper}/source.tex`) |
35
+ | “What’s the high-level approach?” | Read `{paper}/summary.md` |
36
+ | “Find the exact formulation / definitions” | Read `{paper}/source.tex` |
37
+ | “Which papers discuss X?” | Run `papi search "X"` (fast) or `papi ask "X"` (PaperQA2) |
38
+ | “Compare methods across papers” | Load multiple `{paper}/equations.md` files |
39
+ | “Do the generated summaries/equations look sane?” | Run `papi audit` (and optionally regenerate flagged papers) |
40
+
41
+ ### Useful Commands
42
+
43
+ ```bash
44
+ # List papers and tags
45
+ papi list
46
+ papi tags
47
+
48
+ # Search by title, tag, or content
49
+ papi search "sdf loss"
50
+
51
+ # Export equations/summaries into the repo for a coding session
52
+ papi export neuralangelo neus --level equations --to ./paper-context/
53
+
54
+ # Or print directly to stdout for pasting into a terminal agent session
55
+ papi show neuralangelo neus --level eq
56
+
57
+ # Add papers (arXiv) / regenerate; use --no-llm to avoid LLM calls
58
+ papi add 2303.13476 # name auto-generated
59
+ papi add 2303.13476 --name neuralangelo # or explicit name
60
+ papi add 2303.13476 --update # refresh existing paper in-place
61
+ papi add 2303.13476 --duplicate # add a second copy (-2/-3 suffix)
62
+ papi regenerate neuralangelo --no-llm
63
+
64
+ # Audit generated content for obvious issues (and optionally regenerate flagged papers)
65
+ papi audit
66
+ papi audit --limit 5 --seed 0
67
+ papi audit --regenerate --no-llm -o summary,equations,tags
68
+ ```
69
+
70
+ ### LLM Configuration (Optional)
71
+
72
+ ```bash
73
+ export PAPERPIPE_LLM_MODEL="gemini/gemini-3-flash-preview" # any LiteLLM identifier
74
+ export PAPERPIPE_LLM_TEMPERATURE=0.3 # default: 0.3
75
+ ```
76
+
77
+ Without LLM, paperpipe falls back to metadata + section headings + regex equation extraction.
78
+
79
+ ### Code Verification Workflow
80
+
81
+ 1. Identify the referenced paper(s) (comments, function names, README, etc.)
82
+ 2. Read `{paper}/equations.md` and compare symbol-by-symbol with the implementation
83
+ 3. If ambiguous, confirm definitions/assumptions in `{paper}/source.tex`
84
+ 4. If the question is broad or spans multiple papers, run `papi ask "..."` (requires PaperQA2)
LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Matthias Humt
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,398 @@
1
+ # paperpipe
2
+
3
+ A unified paper database for coding agents + [PaperQA2](https://github.com/Future-House/paper-qa).
4
+
5
+ **The problem:** You want AI coding assistants (Claude Code, Codex CLI, Gemini CLI) to reference scientific papers while implementing algorithms. But:
6
+ - PDFs are token-heavy and lose equation fidelity
7
+ - PaperQA2 is great for research but not optimized for code verification
8
+ - No simple way to ask "does my code match equation 7?"
9
+
10
+ **The solution:** A local database that stores:
11
+ - PDFs (for PaperQA2 RAG queries)
12
+ - LaTeX source (for exact equation comparison)
13
+ - Summaries optimized for coding context
14
+ - Extracted equations with explanations
15
+
16
+ ## Installation
17
+
18
+ ### With uv (recommended)
19
+
20
+ ```bash
21
+ # Basic installation
22
+ uv pip install paperpipe
23
+
24
+ # With LLM support (for better summaries/equations)
25
+ uv pip install 'paperpipe[llm]'
26
+
27
+ # With PaperQA2 integration
28
+ uv pip install 'paperpipe[paperqa]'
29
+
30
+ # Everything
31
+ uv pip install 'paperpipe[all]'
32
+ ```
33
+
34
+ Or install from source:
35
+ ```bash
36
+ git clone https://github.com/hummat/paperpipe
37
+ cd paperpipe
38
+ uv pip install -e ".[all]"
39
+ ```
40
+
41
+ ### With pip
42
+
43
+ ```bash
44
+ # Basic installation
45
+ pip install paperpipe
46
+
47
+ # With LLM support (for better summaries/equations)
48
+ pip install 'paperpipe[llm]'
49
+
50
+ # With PaperQA2 integration
51
+ pip install 'paperpipe[paperqa]'
52
+
53
+ # With PaperQA2 + multimodal PDF parsing (images/tables; installs Pillow)
54
+ pip install 'paperpipe[paperqa-media]'
55
+
56
+ # Everything
57
+ pip install 'paperpipe[all]'
58
+ ```
59
+
60
+ Or install from source:
61
+ ```bash
62
+ git clone https://github.com/hummat/paperpipe
63
+ cd paperpipe
64
+ pip install -e ".[all]"
65
+ ```
66
+
67
+ ## Development
68
+
69
+ ```bash
70
+ # Install app + dev tooling (ruff, pyright, pytest)
71
+ uv sync --group dev
72
+
73
+ uv run ruff check .
74
+ uv run pyright
75
+ uv run pytest -m "not integration"
76
+ ```
77
+
78
+ ## Quick Start
79
+
80
+ ```bash
81
+ # Add papers (names auto-generated from title; auto-tags from arXiv + LLM)
82
+ papi add 2303.13476 2106.10689 2112.03907
83
+
84
+ # Override auto-generated name with --name (single paper only):
85
+ papi add https://arxiv.org/abs/1706.03762 --name attention
86
+
87
+ # Re-adding the same arXiv ID is idempotent (skips). Use --update to refresh, or --duplicate for another copy:
88
+ papi add 1706.03762
89
+ papi add 1706.03762 --update --name attention
90
+ papi add 1706.03762 --duplicate
91
+
92
+ # List papers
93
+ papi list
94
+ papi list --tag sdf
95
+
96
+ # Search
97
+ papi search "surface reconstruction"
98
+
99
+ # Export for coding session
100
+ papi export neuralangelo neus --level equations --to ./paper-context/
101
+
102
+ # Query with PaperQA2 (if installed)
103
+ papi ask "What are the key differences between NeuS and Neuralangelo loss functions?"
104
+ ```
105
+
106
+ ## Database Structure
107
+
108
+ Default database root is `~/.paperpipe/` (override with `PAPER_DB_PATH`; see `papi path`).
109
+
110
+ ```
111
+ <paper_db>/
112
+ ├── index.json # Quick lookup index
113
+ ├── papers/
114
+ │ ├── neuralangelo/
115
+ │ │ ├── meta.json # Metadata + tags
116
+ │ │ ├── paper.pdf # For PaperQA2
117
+ │ │ ├── source.tex # Full LaTeX (if available)
118
+ │ │ ├── summary.md # Coding-context summary
119
+ │ │ └── equations.md # Key equations extracted
120
+ │ └── neus/
121
+ │ └── ...
122
+ ```
123
+
124
+ ## Integration with Coding Agents
125
+
126
+ > **Tip:** See [AGENT_INTEGRATION.md](AGENT_INTEGRATION.md) for a ready-to-use snippet you can append to your
127
+ > repo's agent instructions file (for example `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`).
128
+
129
+ ### Claude Code / Codex CLI Skill
130
+
131
+ paperpipe includes a skill that automatically activates when you ask about papers,
132
+ verification, or equations. Install it for Claude Code and/or Codex CLI:
133
+
134
+ ```bash
135
+ # Install for both Claude Code and Codex CLI
136
+ papi install-skill
137
+
138
+ # Or install for a specific CLI only
139
+ papi install-skill --claude
140
+ papi install-skill --codex
141
+ ```
142
+
143
+ Restart your CLI after installing the skill.
144
+
145
+ Most coding-agent CLIs can read local files directly. The best workflow is:
146
+
147
+ 1. Use `papi` to build/manage your paper collection.
148
+ 2. For code verification, have the agent read `{paper}/equations.md` (and `source.tex` when needed).
149
+ 3. For research-y questions across many papers, use `papi ask` (PaperQA2).
150
+
151
+ Minimal snippet to add to your agent instructions:
152
+
153
+ ```markdown
154
+ ## Paper References (PaperPipe)
155
+
156
+ PaperPipe manages papers via `papi`. Find the active database root with:
157
+ `papi path`
158
+
159
+ Per-paper files are under `<paper_db>/papers/{paper}/`:
160
+ - `equations.md` — best for implementation verification
161
+ - `summary.md` — high-level overview
162
+ - `source.tex` — exact definitions (if available)
163
+
164
+ Use `papi search "query"` to find papers/tags quickly.
165
+ Use `papi ask "question"` for PaperQA2 multi-paper queries (if installed).
166
+ ```
167
+
168
+ If you want paper context inside your repo (useful for agents that can’t access `~`), export it:
169
+
170
+ ```bash
171
+ papi export neuralangelo neus --level equations --to ./paper-context/
172
+ ```
173
+
174
+ If you want to paste context directly into a terminal agent session, print to stdout:
175
+
176
+ ```bash
177
+ papi show neuralangelo neus --level eq
178
+ ```
179
+
180
+ ## Commands
181
+
182
+ | Command | Description |
183
+ |---------|-------------|
184
+ | `papi add <ids-or-urls...>` | Add one or more papers (idempotent by arXiv ID; use `--update`/`--duplicate` for existing) |
185
+ | `papi regenerate <papers...>` | Regenerate summary/equations/tags (use `--overwrite name` to rename) |
186
+ | `papi regenerate --all` | Regenerate for all papers |
187
+ | `papi audit [papers...]` | Audit generated summaries/equations and optionally regenerate flagged papers |
188
+ | `papi remove <papers...>` | Remove one or more papers (by name or arXiv ID/URL) |
189
+ | `papi list [--tag TAG]` | List papers, optionally filtered by tag |
190
+ | `papi search <query>` | Exact search (with fuzzy fallback if no exact matches) across title/tags/metadata + local summaries/equations (use `--exact` to disable fallback; `--tex` includes LaTeX) |
191
+ | `papi show <papers...>` | Show paper details or print stored content |
192
+ | `papi export <papers...>` | Export context files to a directory |
193
+ | `papi ask <query> [args]` | Query papers via PaperQA2 (supports all pqa args) |
194
+ | `papi models` | Probe which models work with your API keys |
195
+ | `papi tags` | List all tags with counts |
196
+ | `papi path` | Print database location |
197
+ | `papi install-skill` | Install the papi skill for Claude Code / Codex CLI |
198
+ | `--quiet/-q` | Suppress progress messages |
199
+ | `--verbose/-v` | Enable debug output |
200
+
201
+ ## Tagging
202
+
203
+ Papers are automatically tagged from three sources:
204
+
205
+ 1. **arXiv categories** → human-readable tags (cs.CV → computer-vision)
206
+ 2. **LLM-generated** → semantic tags from title/abstract
207
+ 3. **User-provided** → via `--tags` flag
208
+
209
+ ```bash
210
+ # Auto-tags from arXiv + LLM
211
+ papi add 2303.13476
212
+ # → name: neuralangelo, tags: computer-vision, graphics, neural-radiance-field, sdf, hash-encoding
213
+
214
+ # Add custom tags (and override auto-name)
215
+ papi add 2303.13476 --name my-neuralangelo --tags my-project,priority
216
+ ```
217
+
218
+ ## Export Levels
219
+
220
+ ```bash
221
+ # Just summaries (smallest, good for overview)
222
+ papi export neuralangelo neus --level summary
223
+
224
+ # Equations only (best for code verification)
225
+ papi export neuralangelo neus --level equations
226
+
227
+ # Full LaTeX source (most complete)
228
+ papi export neuralangelo neus --level full
229
+ ```
230
+
231
+ ## Show Levels (stdout)
232
+
233
+ ```bash
234
+ # Metadata (default)
235
+ papi show neuralangelo
236
+
237
+ # Print equations (for piping into agent sessions)
238
+ papi show neuralangelo neus --level eq
239
+
240
+ # Print summary / LaTeX
241
+ papi show neuralangelo --level summary
242
+ papi show neuralangelo --level tex
243
+ ```
244
+
245
+ ## Workflow Example
246
+
247
+ ```bash
248
+ # 1. Build your paper collection (names auto-generated)
249
+ papi add 2303.13476 2106.10689 2104.06405
250
+ # → neuralangelo, neus, volsdf
251
+
252
+ # 2. Research phase: use PaperQA2
253
+ papi ask "Compare the volume rendering approaches in NeuS, VolSDF, and Neuralangelo"
254
+
255
+ # 3. Implementation phase: export equations to project
256
+ cd ~/my-neural-surface-project
257
+ papi export neuralangelo neus volsdf --level equations --to ./paper-context/
258
+
259
+ # 4. In Claude Code / Codex / Gemini:
260
+ # "Compare my eikonal_loss() implementation with the formulations in paper-context/"
261
+
262
+ # 5. Clean up: remove papers you no longer need
263
+ papi remove volsdf neus
264
+ ```
265
+
266
+ ## Configuration
267
+
268
+ Set custom database location:
269
+ ```bash
270
+ export PAPER_DB_PATH=/path/to/your/papers
271
+ ```
272
+
273
+ ## Environment Setup
274
+
275
+ To use PaperQA2 via `papi ask` with the built-in default models, set the environment variables for your
276
+ chosen provider (PaperQA2 uses LiteLLM identifiers for `--llm` and `--embedding`).
277
+
278
+ | Provider | Required Env Var | Used For |
279
+ |----------|------------------|----------|
280
+ | **Google** | `GEMINI_API_KEY` | Gemini models & embeddings |
281
+ | **Anthropic** | `ANTHROPIC_API_KEY` | Claude models |
282
+ | **Voyage AI** | `VOYAGE_API_KEY` | Embeddings (recommended when using Claude) |
283
+ | **OpenAI** | `OPENAI_API_KEY` | GPT models & embeddings |
284
+
285
+ ## LLM Support
286
+
287
+ For better summaries and equation extraction, install with LLM support:
288
+
289
+ ```bash
290
+ pip install 'paperpipe[llm]'
291
+ # or with uv:
292
+ uv pip install 'paperpipe[llm]'
293
+ ```
294
+
295
+ This installs LiteLLM, which supports many providers. Set the appropriate API key:
296
+
297
+ ```bash
298
+ export GEMINI_API_KEY=... # For Gemini (default)
299
+ export OPENAI_API_KEY=... # For OpenAI/GPT
300
+ export ANTHROPIC_API_KEY=... # For Claude
301
+ ```
302
+
303
+ paperpipe defaults to `gemini/gemini-3-flash-preview`. Override via:
304
+ ```bash
305
+ export PAPERPIPE_LLM_MODEL=gpt-4o # or any LiteLLM model identifier
306
+ ```
307
+
308
+ You can also tune LLM generation:
309
+ ```bash
310
+ export PAPERPIPE_LLM_TEMPERATURE=0.3 # default: 0.3
311
+ ```
312
+
313
+ Without LLM support, paperpipe falls back to:
314
+ - Metadata + section headings from LaTeX
315
+ - Regex-based equation extraction
316
+
317
+ ## PaperQA2 Integration
318
+
319
+ When both paperpipe and [PaperQA2](https://github.com/Future-House/paper-qa) are installed, they share the same PDFs:
320
+
321
+ ```bash
322
+ # paperpipe stores PDFs in <paper_db>/papers/*/paper.pdf (see `papi path`)
323
+ # paperpipe ask routes to PaperQA2 for complex queries
324
+
325
+ papi ask "What optimizer settings do these papers recommend?"
326
+
327
+ # PaperQA uses LiteLLM model identifiers for `--llm` and `--embedding`.
328
+ # You can also pass through any other `pqa ask` flags after the query/options.
329
+ # By default, `papi ask` uses `pqa --settings default` to avoid failures caused by stale user
330
+ # settings files; pass `-s/--settings <name>` to use a specific PaperQA2 settings profile.
331
+ # `papi ask` also defaults to `--llm gemini/gemini-3-flash-preview` and `--embedding gemini/gemini-embedding-001`
332
+ # unless you pick a PaperQA2 settings profile with `-s/--settings` (in that case, the profile controls).
333
+ # If Pillow is not installed, `papi ask` also forces `--parsing.multimodal OFF` to avoid PDF
334
+ # image extraction errors; pass your own `--parsing...` args to override.
335
+ #
336
+ # Examples (specify LLM + embedding):
337
+ # Gemini 3 Flash + Google Embeddings
338
+ papi ask "Explain the architecture" --llm "gemini/gemini-3-flash-preview" --embedding "gemini/gemini-embedding-001"
339
+
340
+ # Gemini 3 Pro + Google Embeddings
341
+ papi ask "Give a detailed derivation of eq. 4 and explain implementation pitfalls" --llm "gemini/gemini-3-pro-preview" --embedding "gemini/gemini-embedding-001"
342
+
343
+ # Claude Sonnet 4.5 + Voyage AI Embeddings
344
+ papi ask "Compare the loss functions" --llm "claude-sonnet-4-5" --embedding "voyage/voyage-3-large"
345
+
346
+ # GPT-5.2 + OpenAI Embeddings
347
+ papi ask "How to implement eq 4?" --llm "gpt-5.2" --embedding "text-embedding-3-large"
348
+
349
+ # Pass any arbitrary PaperQA2 arguments (e.g., temperature, verbosity)
350
+ papi ask "Summarize the methods" --summary-llm gpt-4o-mini --temperature 0.2 --verbosity 2
351
+ ```
352
+
353
+ ### Model Probing
354
+
355
+ To see which model ids work with your currently configured API keys (this makes small live API calls):
356
+
357
+ ```bash
358
+ papi models
359
+ # (default: probes one "latest" completion model and one embedding model per provider for
360
+ # which you have an API key set; pass `latest` (or `--preset latest`) to probe a broader list.)
361
+ # or probe specific models only:
362
+ papi models --kind completion --model gemini/gemini-3-flash-preview --model gemini/gemini-2.5-flash --model gpt-4o-mini
363
+ papi models --kind embedding --model gemini/gemini-embedding-001 --model text-embedding-3-small
364
+ # probe "latest" defaults (gpt-5.2/5.1, gemini 3 preview, claude-sonnet-4-5; plus text-embedding-3-large if enabled):
365
+ papi models latest
366
+ # probe "last-gen" defaults (gpt-4.1/4o, gemini 2.5, older/smaller embeddings; Claude 3.5 is retired):
367
+ papi models last-gen
368
+ # probe a broader superset:
369
+ papi models all
370
+ # show underlying provider errors (noisy):
371
+ papi models --verbose
372
+ ```
373
+
374
+ ## Non-arXiv Papers
375
+
376
+ PaperPipe currently focuses on arXiv ingestion (`papi add <arxiv-id-or-url>`). For papers not on arXiv you can still
377
+ store files for agents to read, but they will not show up in `papi list/search` unless you also add index/meta
378
+ entries.
379
+
380
+ ```bash
381
+ PAPER_DB="$(papi path)"
382
+ mkdir -p "$PAPER_DB/papers/my-paper"
383
+ cp /path/to/paper.pdf "$PAPER_DB/papers/my-paper/paper.pdf"
384
+ # Create:
385
+ # - "$PAPER_DB/papers/my-paper/summary.md"
386
+ # - "$PAPER_DB/papers/my-paper/equations.md"
387
+ # (optional) "$PAPER_DB/papers/my-paper/source.tex"
388
+ ```
389
+
390
+ ## Credits
391
+
392
+ - **[PaperQA2](https://github.com/Future-House/paper-qa)** by Future House — the RAG engine powering `papi ask`.
393
+ *Skarlinski et al., "Language Agents Achieve Superhuman Synthesis of Scientific Knowledge", 2024.*
394
+ [arXiv:2409.13740](https://arxiv.org/abs/2409.13740)
395
+
396
+ ## License
397
+
398
+ MIT (see [LICENSE](LICENSE))