PyPI - researchbot - Versions diffs - 0.1.0__tar.gz - Mend

researchbot 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

researchbot-0.1.0/.claude/commands/analyze.md +138 -0
researchbot-0.1.0/.claude/commands/compare.md +98 -0
researchbot-0.1.0/.claude/commands/expand.md +152 -0
researchbot-0.1.0/.claude/commands/gaps.md +172 -0
researchbot-0.1.0/.claude/commands/paper_review.md +156 -0
researchbot-0.1.0/.claude/commands/verify_gaps.md +193 -0
researchbot-0.1.0/.env +4 -0
researchbot-0.1.0/.env.example +14 -0
researchbot-0.1.0/.gitignore +19 -0
researchbot-0.1.0/.mcp.json +8 -0
researchbot-0.1.0/.python-version +1 -0
researchbot-0.1.0/CLAUDE.md +97 -0
researchbot-0.1.0/LICENSE +21 -0
researchbot-0.1.0/PKG-INFO +228 -0
researchbot-0.1.0/README.md +200 -0
researchbot-0.1.0/docs/Vision.md +147 -0
researchbot-0.1.0/docs/next_task.md +29 -0
researchbot-0.1.0/docs/plan.md +173 -0
researchbot-0.1.0/docs/vision_brain_dump.md +16 -0
researchbot-0.1.0/docs/wip.md +20 -0
researchbot-0.1.0/main.py +6 -0
researchbot-0.1.0/pyproject.toml +42 -0
researchbot-0.1.0/researchbot/__init__.py +1 -0
researchbot-0.1.0/researchbot/cli.py +115 -0
researchbot-0.1.0/researchbot/config.py +35 -0
researchbot-0.1.0/researchbot/mcp_server.py +163 -0
researchbot-0.1.0/researchbot/ocr.py +127 -0
researchbot-0.1.0/researchbot/pdf.py +225 -0
researchbot-0.1.0/researchbot/scholar.py +224 -0
researchbot-0.1.0/uv.lock +1051 -0

researchbot-0.1.0/.claude/commands/analyze.md ADDED Viewed

@@ -0,0 +1,138 @@
+# Analyze
+Perform an in-depth analysis of an academic paper.
+## Input
+The user provides `$ARGUMENTS` in the format: `<paper_reference> [output_path] [--model=<opus|sonnet>]`
+**Paper reference** (required) can be:
+- An arXiv ID (e.g. `ARXIV:1706.03762`)
+- A DOI (e.g. `DOI:10.18653/v1/N19-1423`)
+- A Semantic Scholar URL or ID
+- A paper name (e.g. `Attention is All You Need`)
+**Output path** (optional) can be:
+- A directory path (absolute or relative) where the analysis will be saved as `{paper_slug}.md`
+- If not provided, the analysis is printed to the conversation
+**Model** (optional):
+- `--model=opus` (default) — Use Opus 4.6 for deep, thorough analysis
+- `--model=sonnet` — Use Sonnet 4.5 for faster analysis
+Examples:
+- `/analyze "Attention is All You Need"` — Output to conversation, use opus
+- `/analyze ARXIV:1706.03762 workspace/transformers/` — Save to directory, use opus
+- `/analyze Mamba /home/user/papers/ --model=sonnet` — Save to absolute path, use sonnet
+- `/analyze "LoRA" --model=sonnet` — Output to conversation, use sonnet
+## Instructions
+### Phase 1: Parse Arguments
+1. **Extract the model parameter.** Check if `$ARGUMENTS` contains `--model=opus` or `--model=sonnet`.
+   - If `--model=sonnet` is present, use model: "sonnet"
+   - Otherwise, default to model: "opus"
+   - Remove the `--model=...` flag from the arguments string
+2. **Parse paper reference and output path.** From the remaining arguments, extract:
+   - `paper_reference`: The paper identifier (everything before the last space, or all of it if no path provided)
+   - `output_path`: Optional directory path (the last token if it looks like a path)
+   To detect if an output path was provided, check if the last token:
+   - Ends with `/` or `\`, OR
+   - Contains `/` or `\` path separators, OR
+   - Is explicitly a directory (check with bash `test -d`)
+   If ambiguous, assume no output path was provided.
+### Phase 2: Launch Subagent
+Use the Task tool with `subagent_type: "general-purpose"` and the selected model to spawn a subagent with the following prompt:
+```
+Analyze an academic paper in depth.
+PAPER REFERENCE: {paper_reference}
+OUTPUT PATH: {output_path or "Print to conversation"}
+Instructions
+1. **Resolve and fetch the paper.** Use the `read_paper` MCP tool with the paper reference. This returns JSON with:
+   - `cache_path`: Path to the cached markdown file
+   - `title`, `authors`, `year`, `venue`, `citation_count`: Paper metadata
+   If `read_paper` fails, try using `search_papers` to find the paper and then `read_paper` with the resolved ID.
+2. **Read the full paper text.** Use the Read tool to read the paper from `cache_path`. The References section has already been stripped to reduce size.
+3. **Analyze the paper carefully.** Then extract the following structured analysis:
+   ### Core Contribution
+   What is the main contribution of this paper? What problem does it solve and what is novel about the approach?
+   ### Methodology
+   Describe the technical approach. What models, algorithms, or frameworks are introduced? Include key equations or formulations if they are central to the contribution.
+   ### Key Results
+   What are the main experimental results? Include specific numbers, benchmarks, and comparisons to baselines where available.
+   ### Limitations
+   What limitations do the authors acknowledge? What limitations are apparent but not stated?
+   ### Future Work
+   What directions for future work do the authors suggest? What open questions remain?
+   ### Key References
+   List 3-5 of the most important references cited in this paper that a reader should also look at.
+4. **Output the analysis.**
+   - **If no output path was provided:** Print the structured analysis as markdown to the conversation
+   - **If an output path was provided:**
+     1. Create a slug from the paper title (lowercase, hyphens, alphanumeric only)
+     2. Create the output directory if it doesn't exist (use `mkdir -p`)
+     3. Resolve relative paths relative to the current working directory
+     4. Write the analysis to `{output_path}/{slug}.md`
+     5. Inform the user where the file was saved
+Use the output format specified below.
+```
+### Phase 3: Report Results
+After the subagent completes:
+1. If an error occurred, inform the user
+2. If the analysis was written to a file, confirm the file location
+3. If the analysis was printed to the conversation, the subagent will have already displayed it
+## Output Format
+```
+# Analysis: {Paper Title}
+**Authors:** {authors}
+**Year:** {year} | **Venue:** {venue} | **Citations:** {count}
+## TL;DR
+{One-paragraph summary of the paper}
+## Core Contribution
+{...}
+## Methodology
+{...}
+## Key Results
+{...}
+## Limitations
+{...}
+## Future Work
+{...}
+## Key References
+- {ref 1}
+- {ref 2}
+- ...
+```

researchbot-0.1.0/.claude/commands/compare.md ADDED Viewed

@@ -0,0 +1,98 @@
+# Compare Papers
+Compare two academic papers, focusing on problem formulation and methodology.
+## Input
+The user provides exactly 2 paper references as `$ARGUMENTS`. Each reference can be:
+- An arXiv ID (e.g. `ARXIV:1706.03762`)
+- An arXiv URL (e.g. `https://arxiv.org/abs/1706.03762`)
+- A direct PDF URL (e.g. `https://arxiv.org/pdf/1706.03762.pdf`)
+- A DOI (e.g. `DOI:10.18653/v1/N19-1423`)
+- A Semantic Scholar URL or ID
+- A paper name (e.g. `Attention is All You Need`)
+References are separated by spaces or commas. If a paper name contains spaces, the user may quote it.
+Examples:
+- `/compare Mamba S4`
+- `/compare ARXIV:2312.00752 ARXIV:2111.00396`
+- `/compare https://arxiv.org/abs/2312.00752 https://arxiv.org/abs/2111.00396`
+- `/compare "Attention is All You Need" "Mamba: Linear-Time Sequence Modeling"`
+## Instructions
+1. **Parse the input.** Extract exactly 2 paper references from `$ARGUMENTS`. If fewer or more than 2 papers are provided, inform the user and ask for clarification.
+2. **Resolve and fetch each paper.** For each paper reference:
+   - Use the `read_paper` MCP tool, which returns JSON with:
+     - `cache_path`: Path to the cached markdown file
+     - `title`, `authors`, `year`, `venue`, `citation_count`: Paper metadata
+   - If resolution fails, try `search_papers` to find the paper first
+3. **Read both papers.** Use the Read tool to read each paper from its `cache_path`. The References section has already been stripped to reduce size. Analyze the full text of each paper before comparing.
+4. **Compare on two primary dimensions:**
+   **Problem Formulation:**
+   - How does each paper frame the problem?
+   - What assumptions does each paper make?
+   - What are they fundamentally trying to achieve?
+   - What's the key difference in how they conceptualize the problem?
+   **Methodology:**
+   - What technical approach does each paper take?
+   - What are the key components or innovations?
+   - What trade-offs does each approach make?
+   - What's fundamentally different about how they solve the problem?
+5. **Check for a focus aspect.** If the user specifies a particular aspect to compare (e.g., "compare on scalability", "focus on experimental setup"), emphasize that aspect in your comparison while still covering the primary dimensions.
+6. **Output the comparison directly.** Print the structured comparison as markdown to the conversation. Do NOT write to any file.
+## Output Format
+Use short names for papers (e.g., "Mamba" instead of full title) throughout the comparison for readability.
+```markdown
+# Comparison: {Paper 1 short name} vs {Paper 2 short name}
+## Papers
+| Paper | Year | Venue | Citations |
+|-------|------|-------|-----------|
+| {Full Title 1} | {year} | {venue} | {count} |
+| {Full Title 2} | {year} | {venue} | {count} |
+## Problem Formulation
+### {Paper 1 short name}
+{How this paper frames the problem. What are they trying to solve? What assumptions do they make? What constraints or requirements do they identify?}
+### {Paper 2 short name}
+{How this paper frames the problem. What are they trying to solve? What assumptions do they make? What constraints or requirements do they identify?}
+### Key Differences in Problem Framing
+{What's fundamentally different about how each paper conceptualizes the problem? Do they make different assumptions? Target different constraints? Frame success differently?}
+## Methodology
+### {Paper 1 short name}
+{Technical approach. Key components and how they work. Main innovations or contributions.}
+### {Paper 2 short name}
+{Technical approach. Key components and how they work. Main innovations or contributions.}
+### Key Methodological Differences
+{What's fundamentally different about the approaches? What trade-offs does each make? Where would each approach be preferred?}
+## Summary
+{One paragraph synthesizing the key takeaways. When would you use one approach vs the other? What does each paper contribute that the other doesn't?}
+```
+## Notes
+- Be specific and technical in your comparisons. Avoid vague statements like "both papers address the problem well."
+- When comparing methodology, focus on the "why" behind design choices, not just the "what."
+- If papers have different scopes (e.g., one is more theoretical, one more empirical), acknowledge this in your comparison.

researchbot-0.1.0/.claude/commands/expand.md ADDED Viewed

@@ -0,0 +1,152 @@
+# Expand: Find Related Works
+Find and analyze papers solving the same problem as a seed paper.
+## Input
+The user provides a seed paper reference as `$ARGUMENTS`. This can be:
+- An arXiv ID (e.g. `ARXIV:1706.03762`)
+- An arXiv URL (e.g. `https://arxiv.org/abs/1706.03762`)
+- A direct PDF URL (e.g. `https://arxiv.org/pdf/1706.03762.pdf`)
+- A DOI (e.g. `DOI:10.18653/v1/N19-1423`)
+- A Semantic Scholar URL or ID
+- A paper name (e.g. `Attention is All You Need`)
+Examples:
+- `/expand Mamba`
+- `/expand ARXIV:2312.00752`
+- `/expand https://arxiv.org/abs/1706.03762`
+- `/expand "Attention is All You Need"`
+## Instructions
+### Phase 1: Analyze the Seed Paper
+1. **Resolve and fetch the seed paper.** Use `read_paper` with the provided reference. This returns JSON with:
+   - `cache_path`: Path to the cached markdown file
+   - `title`, `authors`, `year`, `venue`, `citation_count`: Paper metadata
+   If it fails, use `search_papers` to find the paper first.
+2. **Read the full paper text.** Use the Read tool to read the paper from `cache_path`. The References section has already been stripped.
+3. **Extract the specific research problem.** Read the paper carefully and identify:
+   - The **specific problem** being addressed (not the broad topic)
+   - The problem should be stated in one sentence
+   - Example: "Efficient sequence modeling with linear complexity" (not "machine learning" or "NLP")
+4. **Generate a folder slug.** Create a slug from the research problem (lowercase, hyphens, 3-5 words).
+   - Example: `efficient-sequence-modeling` or `low-rank-adaptation-llms`
+### Phase 2: Search Exhaustively
+Generate 3-5 targeted search queries based on the problem statement, then search using ALL of these methods:
+1. **Keyword search.** Use `search_papers` with each query (limit 20 per query)
+2. **Citations.** Use `get_citations` on the seed paper (limit 50)
+3. **References.** Use `get_references` on the seed paper (limit 50)
+4. **Similar papers.** Use `search_similar` on the seed paper (limit 20)
+Deduplicate results by paper ID. You should have 50-150 candidate papers.
+### Phase 3: Parallel Paper Analysis (Subagents)
+For each candidate paper, spawn a subagent to analyze it. Run subagents in parallel (batch 10-15 at a time).
+**Subagent prompt template** (use Task tool with `subagent_type: "general-purpose"` and `model: "sonnet"`):
+```
+Analyze whether this paper solves the SAME PROBLEM as the seed paper.
+SEED PAPER:
+- Title: {seed_title}
+- Problem: {problem_statement}
+CANDIDATE PAPER ID: {candidate_paper_id}
+Instructions:
+1. Use `get_paper` to get the candidate paper's metadata (title, abstract, year)
+2. Read the abstract carefully
+3. Determine: Does this paper solve the SAME SPECIFIC PROBLEM as the seed?
+   - YES if it addresses the exact same problem (different approach is fine)
+   - NO if it's merely related, uses similar methods, or addresses a broader/narrower problem
+If YES, also extract:
+- approach: One sentence describing how it tackles the problem
+- key_difference: How does it differ from the seed paper's approach?
+- contribution: The main takeaway
+Return your analysis as JSON:
+{
+  "paper_id": "...",
+  "title": "...",
+  "year": ...,
+  "relevant": true/false,
+  "reason": "Why it is or isn't solving the same problem",
+  "approach": "..." (only if relevant),
+  "key_difference": "..." (only if relevant),
+  "contribution": "..." (only if relevant)
+}
+```
+### Phase 4: Synthesize Results
+1. **Collect all subagent reports.** Filter to only relevant papers (those solving the same problem).
+2. **Group by approach.** If there are clear categories of approaches, group papers accordingly.
+3. **Create the workspace folder.** Create `workspace/{slug}/` if it doesn't exist.
+4. **Write the output file.** Write to `workspace/{slug}/related_works.md` using the format below.
+## Output Format
+Write to `workspace/{slug}/related_works.md`:
+```markdown
+# Related Works: {Research Problem}
+**Seed paper:** {Title} ({year})
+**Problem:** {One-sentence problem statement}
+**Papers analyzed:** {N} candidates → {M} relevant
+## Overview
+{Brief synthesis: How many papers address this problem? What are the main approaches? Any clear trends over time?}
+## Papers
+### {Paper 1 Title} ({year})
+**Approach:** {How this paper tackles the problem}
+**Key difference from seed:** {What's different about their approach}
+**Contribution:** {Main takeaway}
+### {Paper 2 Title} ({year})
+...
+## Approach Categories
+{Group papers by their approach if there are 2+ clear categories. Otherwise, omit this section.}
+### {Category 1 Name}
+- {Paper A}: {brief description}
+- {Paper B}: {brief description}
+### {Category 2 Name}
+- {Paper C}: {brief description}
+- {Paper D}: {brief description}
+## Summary
+{What approaches exist to solve this problem? What trade-offs do they make? What does the seed paper contribute relative to this landscape? What's missing or underexplored?}
+```
+## Notes
+- **Strict relevance filter.** Only include papers that solve the SAME problem. "Related" or "similar methods" is not enough. When in doubt, exclude.
+- **Parallel execution.** Use the Task tool to spawn subagents in parallel. Send multiple Task tool calls in a single message.
+- **Use Sonnet model.** Always set `model: "sonnet"` when spawning subagents to balance speed and quality.
+- **Handle failures gracefully.** If a subagent fails or returns invalid JSON, skip that paper and continue.
+- **Inform the user.** After completing, tell the user where the file was written and give a brief summary (e.g., "Analyzed 127 candidates, found 23 papers solving the same problem").

researchbot-0.1.0/.claude/commands/gaps.md ADDED Viewed

@@ -0,0 +1,172 @@
+# Gaps: Identify Research Gaps
+Analyze a related works document to identify gaps, open questions, and promising research directions.
+## Input
+The user provides a workspace folder path as `$ARGUMENTS`. This folder should contain a `related_works.md` file created by `/expand`.
+Examples:
+- `/gaps workspace/efficient-sequence-modeling/`
+- `/gaps workspace/low-rank-adaptation-llms/`
+## Instructions
+### Phase 1: Read and Understand the Landscape
+1. **Read the related works document.** Use the Read tool to read `$ARGUMENTS/related_works.md`.
+2. **Extract key information:**
+   - The research problem being addressed
+   - The seed paper and its approach
+   - All related papers and their approaches
+   - The approach categories (if present)
+   - The existing summary/synthesis
+3. **Build a mental model** of the research landscape:
+   - What approaches have been tried?
+   - What results have been achieved?
+   - What trade-offs do different approaches make?
+   - What's the current state-of-the-art?
+### Phase 2: Identify Gaps
+Analyze the landscape systematically. For each category below, look for what's missing, unclear, or underexplored.
+#### Methodological Gaps
+- What techniques haven't been tried?
+- What limitations do current approaches share?
+- What combinations of methods are unexplored?
+- What architectural choices are untested?
+- Are there approaches from adjacent fields that haven't been applied?
+#### Empirical Gaps
+- What settings or domains haven't been tested?
+- What scales (larger/smaller) are unexplored?
+- What datasets or benchmarks are missing?
+- Are results robust across different conditions?
+- What ablations or analyses are missing?
+#### Theoretical Gaps
+- What phenomena lack explanation?
+- What assumptions are untested or questionable?
+- Why do certain approaches work (or not work)?
+- What are the fundamental limits?
+- What theoretical frameworks are missing?
+#### Application Gaps
+- What use cases haven't been explored?
+- What domains could benefit but haven't been tried?
+- What practical constraints haven't been addressed?
+- What deployment scenarios are missing?
+### Phase 3: Assess Each Gap
+For each gap you identify, assess:
+1. **Significance** — How important is filling this gap?
+   - Would it advance the field substantially?
+   - Does it block progress on other fronts?
+   - Would it have practical impact?
+2. **Tractability** — How feasible is it to address?
+   - **High**: Clear path forward, resources exist, could be done with current methods
+   - **Medium**: Requires some innovation or significant effort, but achievable
+   - **Low**: Fundamental challenges, unclear how to proceed, may require breakthroughs
+3. **Evidence** — What from the papers supports this being a gap?
+   - Which papers show this limitation?
+   - What's been tried vs. what hasn't?
+4. **Potential approach** — How might this gap be addressed?
+   - What would a solution look like?
+   - What would be needed (data, compute, new methods)?
+### Phase 4: Rank Opportunities
+Identify the **top 3-5 research opportunities** by combining significance and tractability:
+- High significance + High tractability = Top opportunity
+- High significance + Medium tractability = Strong opportunity
+- Medium significance + High tractability = Good opportunity
+For each top opportunity, explain why it's promising and what makes it actionable.
+### Phase 5: Write Output
+Write the analysis to `$ARGUMENTS/gaps.md` using the format below.
+## Output Format
+Write to `$ARGUMENTS/gaps.md`:
+```markdown
+# Research Gaps: {Research Problem}
+**Based on:** related_works.md ({N} papers analyzed)
+**Seed paper:** {Title}
+## Summary
+{One paragraph overview: What's the state of the field? What are the major gaps? What's the most promising direction for new research?}
+## Methodological Gaps
+### {Gap Title}
+**Description:** {What's missing or limited in current approaches}
+**Evidence:** {Which papers show this limitation, what has/hasn't been tried}
+**Potential approach:** {How this might be addressed}
+**Tractability:** High / Medium / Low
+### {Gap Title}
+...
+## Empirical Gaps
+### {Gap Title}
+**Description:** {What settings, domains, or scales are untested}
+**Evidence:** {What's been tested vs. what hasn't}
+**Potential approach:** {How this might be addressed}
+**Tractability:** High / Medium / Low
+### {Gap Title}
+...
+## Theoretical Gaps
+### {Gap Title}
+**Description:** {What's not understood or explained}
+**Evidence:** {What questions remain open}
+**Potential approach:** {How this might be addressed}
+**Tractability:** High / Medium / Low
+### {Gap Title}
+...
+## Application Gaps
+### {Gap Title}
+**Description:** {What use cases or domains are underexplored}
+**Evidence:** {What applications haven't been tried}
+**Potential approach:** {How this might be addressed}
+**Tractability:** High / Medium / Low
+### {Gap Title}
+...
+## Top Opportunities
+{Rank the top 3-5 gaps by a combination of significance and tractability. These are the most promising research directions.}
+1. **{Gap name}** — {Why this is promising: what makes it significant AND tractable}
+2. **{Gap name}** — {Why this is promising}
+3. **{Gap name}** — {Why this is promising}
+```
+## Notes
+- **Be specific.** Vague gaps like "needs more research" are not useful. Identify concrete, actionable gaps.
+- **Ground in evidence.** Every gap should be supported by what the papers do or don't address.
+- **Quality over quantity.** It's better to identify 2-3 significant gaps per category than to list every possible limitation.
+- **Omit empty categories.** If there are no meaningful gaps in a category, omit that section entirely.
+- **Focus on actionable gaps.** Prioritize gaps that could realistically be addressed by a research project.
+- **Inform the user.** After completing, tell the user where the file was written and highlight the top 2-3 opportunities.