PyPI - paperpipe - Versions diffs - 0.1.0__tar.gz - Mend

paperpipe 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

paperpipe-0.1.0/.gitignore +23 -0
paperpipe-0.1.0/AGENT_INTEGRATION.md +84 -0
paperpipe-0.1.0/LICENSE +21 -0
paperpipe-0.1.0/PKG-INFO +459 -0
paperpipe-0.1.0/README.md +398 -0
paperpipe-0.1.0/paperpipe.py +2959 -0
paperpipe-0.1.0/pyproject.toml +122 -0
paperpipe-0.1.0/skill/SKILL.md +83 -0
paperpipe-0.1.0/skill/commands.md +81 -0

paperpipe-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,23 @@
+__pycache__/
+*.py[cod]
+*$py.class
+.pytest_cache/
+.ruff_cache/
+.ruff_cache_tmp/
+.claude/
+.venv/
+.uv-cache/
+env/
+venv/
+build/
+dist/
+*.egg-info/
+.coverage
+.coverage.*
+coverage.xml
+htmlcov/
+.python-version
+uv.lock
+CLAUDE.md
+GEMINI.md
+QWEN.md

paperpipe-0.1.0/AGENT_INTEGRATION.md ADDED Viewed

@@ -0,0 +1,84 @@
+# Agent Integration Snippet (PaperPipe)
+Add this section to your project's agent instructions file:
+- Preferred: `AGENTS.md`
+- Also works: `CLAUDE.md`, `GEMINI.md`, or your agent’s equivalent
+---
+## Paper References (PaperPipe)
+This project implements methods from scientific papers. Papers are managed via `papi` (paperpipe).
+### Paper Database Location
+Default database root is `~/.paperpipe/`, but it may be overridden (e.g. via `PAPER_DB_PATH`).
+Prefer discovering the active location with:
+```bash
+papi path
+```
+Per-paper files live at: `<paper_db>/papers/{paper}/`
+- `meta.json` — metadata + tags
+- `summary.md` — coding-context overview
+- `equations.md` — key equations + explanations (best for implementation verification)
+- `source.tex` — full LaTeX (if available)
+- `paper.pdf` — PDF (used by PaperQA2)
+### When to Use What
+| Task | Best source |
+|------|-------------|
+| “Does my code match the paper?” | Read `{paper}/equations.md` (and/or `{paper}/source.tex`) |
+| “What’s the high-level approach?” | Read `{paper}/summary.md` |
+| “Find the exact formulation / definitions” | Read `{paper}/source.tex` |
+| “Which papers discuss X?” | Run `papi search "X"` (fast) or `papi ask "X"` (PaperQA2) |
+| “Compare methods across papers” | Load multiple `{paper}/equations.md` files |
+| “Do the generated summaries/equations look sane?” | Run `papi audit` (and optionally regenerate flagged papers) |
+### Useful Commands
+```bash
+# List papers and tags
+papi list
+papi tags
+# Search by title, tag, or content
+papi search "sdf loss"
+# Export equations/summaries into the repo for a coding session
+papi export neuralangelo neus --level equations --to ./paper-context/
+# Or print directly to stdout for pasting into a terminal agent session
+papi show neuralangelo neus --level eq
+# Add papers (arXiv) / regenerate; use --no-llm to avoid LLM calls
+papi add 2303.13476                      # name auto-generated
+papi add 2303.13476 --name neuralangelo  # or explicit name
+papi add 2303.13476 --update             # refresh existing paper in-place
+papi add 2303.13476 --duplicate          # add a second copy (-2/-3 suffix)
+papi regenerate neuralangelo --no-llm
+# Audit generated content for obvious issues (and optionally regenerate flagged papers)
+papi audit
+papi audit --limit 5 --seed 0
+papi audit --regenerate --no-llm -o summary,equations,tags
+```
+### LLM Configuration (Optional)
+```bash
+export PAPERPIPE_LLM_MODEL="gemini/gemini-3-flash-preview"  # any LiteLLM identifier
+export PAPERPIPE_LLM_TEMPERATURE=0.3                        # default: 0.3
+```
+Without LLM, paperpipe falls back to metadata + section headings + regex equation extraction.
+### Code Verification Workflow
+1. Identify the referenced paper(s) (comments, function names, README, etc.)
+2. Read `{paper}/equations.md` and compare symbol-by-symbol with the implementation
+3. If ambiguous, confirm definitions/assumptions in `{paper}/source.tex`
+4. If the question is broad or spans multiple papers, run `papi ask "..."` (requires PaperQA2)

paperpipe-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Matthias Humt
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

paperpipe-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,459 @@
+Metadata-Version: 2.4
+Name: paperpipe
+Version: 0.1.0
+Summary: Unified paper database for coding agents + PaperQA2
+Project-URL: Homepage, https://github.com/hummat/paperpipe
+Project-URL: Documentation, https://github.com/hummat/paperpipe#readme
+Project-URL: Repository, https://github.com/hummat/paperpipe
+Author: Matthias Humt
+License: MIT License
+        Copyright (c) 2025 Matthias Humt
+        Permission is hereby granted, free of charge, to any person obtaining a copy
+        of this software and associated documentation files (the "Software"), to deal
+        in the Software without restriction, including without limitation the rights
+        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+        copies of the Software, and to permit persons to whom the Software is
+        furnished to do so, subject to the following conditions:
+        The above copyright notice and this permission notice shall be included in all
+        copies or substantial portions of the Software.
+        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+        SOFTWARE.
+License-File: LICENSE
+Keywords: arxiv,coding-agent,llm,paperqa,papers,research
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Python: >=3.10
+Requires-Dist: arxiv>=2.0.0
+Requires-Dist: click>=8.0.0
+Requires-Dist: requests>=2.28.0
+Provides-Extra: all
+Requires-Dist: litellm>=1.0.0; extra == 'all'
+Requires-Dist: paper-qa>=5.0.0; (python_version >= '3.11') and extra == 'all'
+Requires-Dist: paper-qa[pypdf-media]>=5.0.0; (python_version >= '3.11') and extra == 'all'
+Provides-Extra: dev
+Requires-Dist: build>=1.0.0; extra == 'dev'
+Requires-Dist: pyright>=1.1.385; extra == 'dev'
+Requires-Dist: pytest>=7.0.0; extra == 'dev'
+Requires-Dist: ruff>=0.1.0; extra == 'dev'
+Requires-Dist: twine>=5.0.0; extra == 'dev'
+Provides-Extra: llm
+Requires-Dist: litellm>=1.0.0; extra == 'llm'
+Provides-Extra: paperqa
+Requires-Dist: paper-qa>=5.0.0; (python_version >= '3.11') and extra == 'paperqa'
+Provides-Extra: paperqa-media
+Requires-Dist: paper-qa[pypdf-media]>=5.0.0; (python_version >= '3.11') and extra == 'paperqa-media'
+Description-Content-Type: text/markdown
+# paperpipe
+A unified paper database for coding agents + [PaperQA2](https://github.com/Future-House/paper-qa).
+**The problem:** You want AI coding assistants (Claude Code, Codex CLI, Gemini CLI) to reference scientific papers while implementing algorithms. But:
+- PDFs are token-heavy and lose equation fidelity
+- PaperQA2 is great for research but not optimized for code verification
+- No simple way to ask "does my code match equation 7?"
+**The solution:** A local database that stores:
+- PDFs (for PaperQA2 RAG queries)
+- LaTeX source (for exact equation comparison)
+- Summaries optimized for coding context
+- Extracted equations with explanations
+## Installation
+### With uv (recommended)
+```bash
+# Basic installation
+uv pip install paperpipe
+# With LLM support (for better summaries/equations)
+uv pip install 'paperpipe[llm]'
+# With PaperQA2 integration
+uv pip install 'paperpipe[paperqa]'
+# Everything
+uv pip install 'paperpipe[all]'
+```
+Or install from source:
+```bash
+git clone https://github.com/hummat/paperpipe
+cd paperpipe
+uv pip install -e ".[all]"
+```
+### With pip
+```bash
+# Basic installation
+pip install paperpipe
+# With LLM support (for better summaries/equations)
+pip install 'paperpipe[llm]'
+# With PaperQA2 integration
+pip install 'paperpipe[paperqa]'
+# With PaperQA2 + multimodal PDF parsing (images/tables; installs Pillow)
+pip install 'paperpipe[paperqa-media]'
+# Everything
+pip install 'paperpipe[all]'
+```
+Or install from source:
+```bash
+git clone https://github.com/hummat/paperpipe
+cd paperpipe
+pip install -e ".[all]"
+```
+## Development
+```bash
+# Install app + dev tooling (ruff, pyright, pytest)
+uv sync --group dev
+uv run ruff check .
+uv run pyright
+uv run pytest -m "not integration"
+```
+## Quick Start
+```bash
+# Add papers (names auto-generated from title; auto-tags from arXiv + LLM)
+papi add 2303.13476 2106.10689 2112.03907
+# Override auto-generated name with --name (single paper only):
+papi add https://arxiv.org/abs/1706.03762 --name attention
+# Re-adding the same arXiv ID is idempotent (skips). Use --update to refresh, or --duplicate for another copy:
+papi add 1706.03762
+papi add 1706.03762 --update --name attention
+papi add 1706.03762 --duplicate
+# List papers
+papi list
+papi list --tag sdf
+# Search
+papi search "surface reconstruction"
+# Export for coding session
+papi export neuralangelo neus --level equations --to ./paper-context/
+# Query with PaperQA2 (if installed)
+papi ask "What are the key differences between NeuS and Neuralangelo loss functions?"
+```
+## Database Structure
+Default database root is `~/.paperpipe/` (override with `PAPER_DB_PATH`; see `papi path`).
+```
+<paper_db>/
+├── index.json                    # Quick lookup index
+├── papers/
+│   ├── neuralangelo/
+│   │   ├── meta.json             # Metadata + tags
+│   │   ├── paper.pdf             # For PaperQA2
+│   │   ├── source.tex            # Full LaTeX (if available)
+│   │   ├── summary.md            # Coding-context summary
+│   │   └── equations.md          # Key equations extracted
+│   └── neus/
+│       └── ...
+```
+## Integration with Coding Agents
+> **Tip:** See [AGENT_INTEGRATION.md](AGENT_INTEGRATION.md) for a ready-to-use snippet you can append to your
+> repo's agent instructions file (for example `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`).
+### Claude Code / Codex CLI Skill
+paperpipe includes a skill that automatically activates when you ask about papers,
+verification, or equations. Install it for Claude Code and/or Codex CLI:
+```bash
+# Install for both Claude Code and Codex CLI
+papi install-skill
+# Or install for a specific CLI only
+papi install-skill --claude
+papi install-skill --codex
+```
+Restart your CLI after installing the skill.
+Most coding-agent CLIs can read local files directly. The best workflow is:
+1. Use `papi` to build/manage your paper collection.
+2. For code verification, have the agent read `{paper}/equations.md` (and `source.tex` when needed).
+3. For research-y questions across many papers, use `papi ask` (PaperQA2).
+Minimal snippet to add to your agent instructions:
+```markdown
+## Paper References (PaperPipe)
+PaperPipe manages papers via `papi`. Find the active database root with:
+`papi path`
+Per-paper files are under `<paper_db>/papers/{paper}/`:
+- `equations.md` — best for implementation verification
+- `summary.md` — high-level overview
+- `source.tex` — exact definitions (if available)
+Use `papi search "query"` to find papers/tags quickly.
+Use `papi ask "question"` for PaperQA2 multi-paper queries (if installed).
+```
+If you want paper context inside your repo (useful for agents that can’t access `~`), export it:
+```bash
+papi export neuralangelo neus --level equations --to ./paper-context/
+```
+If you want to paste context directly into a terminal agent session, print to stdout:
+```bash
+papi show neuralangelo neus --level eq
+```
+## Commands
+| Command | Description |
+|---------|-------------|
+| `papi add <ids-or-urls...>` | Add one or more papers (idempotent by arXiv ID; use `--update`/`--duplicate` for existing) |
+| `papi regenerate <papers...>` | Regenerate summary/equations/tags (use `--overwrite name` to rename) |
+| `papi regenerate --all` | Regenerate for all papers |
+| `papi audit [papers...]` | Audit generated summaries/equations and optionally regenerate flagged papers |
+| `papi remove <papers...>` | Remove one or more papers (by name or arXiv ID/URL) |
+| `papi list [--tag TAG]` | List papers, optionally filtered by tag |
+| `papi search <query>` | Exact search (with fuzzy fallback if no exact matches) across title/tags/metadata + local summaries/equations (use `--exact` to disable fallback; `--tex` includes LaTeX) |
+| `papi show <papers...>` | Show paper details or print stored content |
+| `papi export <papers...>` | Export context files to a directory |
+| `papi ask <query> [args]` | Query papers via PaperQA2 (supports all pqa args) |
+| `papi models` | Probe which models work with your API keys |
+| `papi tags` | List all tags with counts |
+| `papi path` | Print database location |
+| `papi install-skill` | Install the papi skill for Claude Code / Codex CLI |
+| `--quiet/-q` | Suppress progress messages |
+| `--verbose/-v` | Enable debug output |
+## Tagging
+Papers are automatically tagged from three sources:
+1. **arXiv categories** → human-readable tags (cs.CV → computer-vision)
+2. **LLM-generated** → semantic tags from title/abstract
+3. **User-provided** → via `--tags` flag
+```bash
+# Auto-tags from arXiv + LLM
+papi add 2303.13476
+# → name: neuralangelo, tags: computer-vision, graphics, neural-radiance-field, sdf, hash-encoding
+# Add custom tags (and override auto-name)
+papi add 2303.13476 --name my-neuralangelo --tags my-project,priority
+```
+## Export Levels
+```bash
+# Just summaries (smallest, good for overview)
+papi export neuralangelo neus --level summary
+# Equations only (best for code verification)
+papi export neuralangelo neus --level equations
+# Full LaTeX source (most complete)
+papi export neuralangelo neus --level full
+```
+## Show Levels (stdout)
+```bash
+# Metadata (default)
+papi show neuralangelo
+# Print equations (for piping into agent sessions)
+papi show neuralangelo neus --level eq
+# Print summary / LaTeX
+papi show neuralangelo --level summary
+papi show neuralangelo --level tex
+```
+## Workflow Example
+```bash
+# 1. Build your paper collection (names auto-generated)
+papi add 2303.13476 2106.10689 2104.06405
+# → neuralangelo, neus, volsdf
+# 2. Research phase: use PaperQA2
+papi ask "Compare the volume rendering approaches in NeuS, VolSDF, and Neuralangelo"
+# 3. Implementation phase: export equations to project
+cd ~/my-neural-surface-project
+papi export neuralangelo neus volsdf --level equations --to ./paper-context/
+# 4. In Claude Code / Codex / Gemini:
+# "Compare my eikonal_loss() implementation with the formulations in paper-context/"
+# 5. Clean up: remove papers you no longer need
+papi remove volsdf neus
+```
+## Configuration
+Set custom database location:
+```bash
+export PAPER_DB_PATH=/path/to/your/papers
+```
+## Environment Setup
+To use PaperQA2 via `papi ask` with the built-in default models, set the environment variables for your
+chosen provider (PaperQA2 uses LiteLLM identifiers for `--llm` and `--embedding`).
+| Provider | Required Env Var | Used For |
+|----------|------------------|----------|
+| **Google** | `GEMINI_API_KEY` | Gemini models & embeddings |
+| **Anthropic** | `ANTHROPIC_API_KEY` | Claude models |
+| **Voyage AI** | `VOYAGE_API_KEY` | Embeddings (recommended when using Claude) |
+| **OpenAI** | `OPENAI_API_KEY` | GPT models & embeddings |
+## LLM Support
+For better summaries and equation extraction, install with LLM support:
+```bash
+pip install 'paperpipe[llm]'
+# or with uv:
+uv pip install 'paperpipe[llm]'
+```
+This installs LiteLLM, which supports many providers. Set the appropriate API key:
+```bash
+export GEMINI_API_KEY=...      # For Gemini (default)
+export OPENAI_API_KEY=...      # For OpenAI/GPT
+export ANTHROPIC_API_KEY=...   # For Claude
+```
+paperpipe defaults to `gemini/gemini-3-flash-preview`. Override via:
+```bash
+export PAPERPIPE_LLM_MODEL=gpt-4o  # or any LiteLLM model identifier
+```
+You can also tune LLM generation:
+```bash
+export PAPERPIPE_LLM_TEMPERATURE=0.3  # default: 0.3
+```
+Without LLM support, paperpipe falls back to:
+- Metadata + section headings from LaTeX
+- Regex-based equation extraction
+## PaperQA2 Integration
+When both paperpipe and [PaperQA2](https://github.com/Future-House/paper-qa) are installed, they share the same PDFs:
+```bash
+# paperpipe stores PDFs in <paper_db>/papers/*/paper.pdf (see `papi path`)
+# paperpipe ask routes to PaperQA2 for complex queries
+papi ask "What optimizer settings do these papers recommend?"
+# PaperQA uses LiteLLM model identifiers for `--llm` and `--embedding`.
+# You can also pass through any other `pqa ask` flags after the query/options.
+# By default, `papi ask` uses `pqa --settings default` to avoid failures caused by stale user
+# settings files; pass `-s/--settings <name>` to use a specific PaperQA2 settings profile.
+# `papi ask` also defaults to `--llm gemini/gemini-3-flash-preview` and `--embedding gemini/gemini-embedding-001`
+# unless you pick a PaperQA2 settings profile with `-s/--settings` (in that case, the profile controls).
+# If Pillow is not installed, `papi ask` also forces `--parsing.multimodal OFF` to avoid PDF
+# image extraction errors; pass your own `--parsing...` args to override.
+#
+# Examples (specify LLM + embedding):
+# Gemini 3 Flash + Google Embeddings
+papi ask "Explain the architecture" --llm "gemini/gemini-3-flash-preview" --embedding "gemini/gemini-embedding-001"
+# Gemini 3 Pro + Google Embeddings
+papi ask "Give a detailed derivation of eq. 4 and explain implementation pitfalls" --llm "gemini/gemini-3-pro-preview" --embedding "gemini/gemini-embedding-001"
+# Claude Sonnet 4.5 + Voyage AI Embeddings
+papi ask "Compare the loss functions" --llm "claude-sonnet-4-5" --embedding "voyage/voyage-3-large"
+# GPT-5.2 + OpenAI Embeddings
+papi ask "How to implement eq 4?" --llm "gpt-5.2" --embedding "text-embedding-3-large"
+# Pass any arbitrary PaperQA2 arguments (e.g., temperature, verbosity)
+papi ask "Summarize the methods" --summary-llm gpt-4o-mini --temperature 0.2 --verbosity 2
+```
+### Model Probing
+To see which model ids work with your currently configured API keys (this makes small live API calls):
+```bash
+papi models
+# (default: probes one "latest" completion model and one embedding model per provider for
+# which you have an API key set; pass `latest` (or `--preset latest`) to probe a broader list.)
+# or probe specific models only:
+papi models --kind completion --model gemini/gemini-3-flash-preview --model gemini/gemini-2.5-flash --model gpt-4o-mini
+papi models --kind embedding --model gemini/gemini-embedding-001 --model text-embedding-3-small
+# probe "latest" defaults (gpt-5.2/5.1, gemini 3 preview, claude-sonnet-4-5; plus text-embedding-3-large if enabled):
+papi models latest
+# probe "last-gen" defaults (gpt-4.1/4o, gemini 2.5, older/smaller embeddings; Claude 3.5 is retired):
+papi models last-gen
+# probe a broader superset:
+papi models all
+# show underlying provider errors (noisy):
+papi models --verbose
+```
+## Non-arXiv Papers
+PaperPipe currently focuses on arXiv ingestion (`papi add <arxiv-id-or-url>`). For papers not on arXiv you can still
+store files for agents to read, but they will not show up in `papi list/search` unless you also add index/meta
+entries.
+```bash
+PAPER_DB="$(papi path)"
+mkdir -p "$PAPER_DB/papers/my-paper"
+cp /path/to/paper.pdf "$PAPER_DB/papers/my-paper/paper.pdf"
+# Create:
+# - "$PAPER_DB/papers/my-paper/summary.md"
+# - "$PAPER_DB/papers/my-paper/equations.md"
+# (optional) "$PAPER_DB/papers/my-paper/source.tex"
+```
+## Credits
+- **[PaperQA2](https://github.com/Future-House/paper-qa)** by Future House — the RAG engine powering `papi ask`.
+  *Skarlinski et al., "Language Agents Achieve Superhuman Synthesis of Scientific Knowledge", 2024.*
+  [arXiv:2409.13740](https://arxiv.org/abs/2409.13740)
+## License
+MIT (see [LICENSE](LICENSE))