npm - @wentorai/research-plugins - Versions diffs - 1.2.2 → 1.3.0 - Mend

@wentorai/research-plugins 1.2.2 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (141) hide show

package/skills/literature/search/papers-chat-guide/SKILL.md DELETED Viewed

@@ -1,194 +0,0 @@
----
-name: papers-chat-guide
-description: "Conversational interface for querying and discussing papers"
-metadata:
-  openclaw:
-    emoji: "💬"
-    category: "literature"
-    subcategory: "search"
-    keywords: ["paper chat", "conversational search", "paper QA", "document QA", "RAG papers", "literature chat"]
-    source: "https://github.com/paperswithcode/galai"
----
-# Papers Chat Guide
-## Overview
-Papers Chat systems provide conversational interfaces for querying, discussing, and understanding academic papers. Instead of keyword searches, researchers ask natural language questions and get answers grounded in specific papers with citations. This guide covers building and using RAG-based paper chat systems — from local document Q&A to multi-paper discussion interfaces. Useful for literature comprehension, paper comparison, and research exploration.
-## Architecture
-```
-User Question
-      ↓
-  Query Understanding (expand, decompose)
-      ↓
-  Retrieval (vector search over paper chunks)
-      ↓
-  Re-ranking (cross-encoder relevance scoring)
-      ↓
-  Answer Generation (grounded in retrieved passages)
-      ↓
-  Response + Citations + Follow-up Suggestions
-```
-## Local Paper Chat
-```python
-from papers_chat import PaperChat
-chat = PaperChat(
-    llm_provider="anthropic",
-    embedding_model="all-MiniLM-L6-v2",
-)
-# Index papers
-chat.add_papers([
-    "papers/attention_is_all_you_need.pdf",
-    "papers/bert.pdf",
-    "papers/gpt3.pdf",
-])
-# Ask questions
-response = chat.ask(
-    "How does the attention mechanism in Transformers differ "
-    "from the attention used in earlier seq2seq models?"
-)
-print(response.answer)
-for cite in response.citations:
-    print(f"  [{cite.paper}] p.{cite.page}: {cite.excerpt[:80]}...")
-```
-## Multi-Paper Discussion
-```python
-# Compare across papers
-response = chat.ask(
-    "Compare the pre-training objectives of BERT and GPT-3. "
-    "What are the trade-offs?"
-)
-# Follow-up in conversation
-response = chat.follow_up(
-    "Which approach works better for few-shot learning?"
-)
-# Paper-specific questions
-response = chat.ask(
-    "What is the computational complexity of multi-head attention?",
-    scope=["attention_is_all_you_need.pdf"],
-)
-```
-## Building a Paper Index
-```python
-from papers_chat import PaperIndex
-index = PaperIndex(
-    embedding_model="all-MiniLM-L6-v2",
-    chunk_size=512,
-    chunk_overlap=64,
-    storage_path="./paper_index",
-)
-# Add individual paper
-index.add_paper(
-    path="paper.pdf",
-    metadata={
-        "title": "Attention Is All You Need",
-        "authors": ["Vaswani et al."],
-        "year": 2017,
-        "venue": "NeurIPS",
-    },
-)
-# Add directory of papers
-index.add_directory(
-    "papers/",
-    extract_metadata=True,  # Auto-extract from PDF
-)
-# Search
-results = index.search("positional encoding", top_k=5)
-for r in results:
-    print(f"[{r.paper_title}] (score: {r.score:.3f})")
-    print(f"  {r.text[:120]}...")
-```
-## RAG Pipeline Configuration
-```python
-from papers_chat import RAGConfig
-chat = PaperChat(
-    llm_provider="anthropic",
-    rag_config=RAGConfig(
-        # Retrieval
-        retrieval_top_k=20,
-        rerank_top_k=5,
-        reranker="cross-encoder/ms-marco-MiniLM-L-6-v2",
-        # Chunking
-        chunk_size=512,
-        chunk_overlap=64,
-        chunk_by="paragraph",   # paragraph, sentence, fixed
-        # Generation
-        citation_style="inline",  # inline, footnote, endnote
-        max_answer_length=500,
-        include_quotes=True,
-    ),
-)
-```
-## Batch Question Answering
-```python
-# Process a list of research questions
-questions = [
-    "What datasets are used for evaluating language models?",
-    "How is perplexity calculated and what are its limitations?",
-    "What are the main approaches to reducing model size?",
-]
-results = chat.batch_ask(questions)
-for q, r in zip(questions, results):
-    print(f"Q: {q}")
-    print(f"A: {r.answer[:200]}...")
-    print(f"Sources: {[c.paper for c in r.citations]}")
-    print()
-```
-## Table and Figure Extraction
-```python
-# Query specific paper elements
-response = chat.ask(
-    "What are the BLEU scores reported in Table 2?",
-    scope=["attention_is_all_you_need.pdf"],
-    include_tables=True,
-)
-# Extract all tables from a paper
-tables = chat.extract_tables("paper.pdf")
-for table in tables:
-    print(f"Table {table.number}: {table.caption}")
-    print(table.to_dataframe())
-```
-## Use Cases
-1. **Literature comprehension**: Ask clarifying questions about papers
-2. **Paper comparison**: Cross-paper analysis and synthesis
-3. **Research exploration**: Discover connections across literature
-4. **Study groups**: Collaborative paper discussion
-5. **Quick reference**: Find specific results, methods, or citations
-## References
-- [Galactica](https://github.com/paperswithcode/galai) — Language model for science
-- [LangChain RAG](https://python.langchain.com/docs/use_cases/question_answering/)
-- [LlamaIndex](https://www.llamaindex.ai/) — Data framework for LLM applications

package/skills/literature/search/pasa-paper-search-guide/SKILL.md DELETED Viewed

@@ -1,138 +0,0 @@
----
-name: pasa-paper-search-guide
-description: "Advanced paper search agent powered by LLMs for literature discovery"
-version: 1.0.0
-author: wentor-community
-source: https://github.com/pasa-agent/pasa
-metadata:
-  openclaw:
-    category: "literature"
-    subcategory: "search"
-    keywords:
-      - paper-search
-      - literature-discovery
-      - semantic-search
-      - citation-graph
-      - academic-databases
-      - query-expansion
----
-# PASA Paper Search Guide
-A skill for conducting advanced academic paper searches using LLM-powered query expansion, semantic ranking, and citation-graph exploration. Based on the PASA project (2K stars), this skill transforms simple research questions into comprehensive, systematic literature discovery workflows.
-## Overview
-Finding relevant papers is the foundation of all academic research, yet traditional keyword searches miss semantically related work, and manual citation chasing is time-consuming. PASA addresses this by combining LLM-driven query understanding with multi-source search and intelligent result ranking. The agent acts as a search co-pilot, helping researchers cast a wide net and then systematically narrow results to the most relevant papers.
-This skill is designed for researchers at any career stage who want to go beyond simple database searches and build thorough, reproducible literature collections for reviews, grant proposals, or new research directions.
-## Search Strategy Design
-Before executing any search, the agent helps researchers design a comprehensive strategy:
-**Query Formulation**
-- Decompose the research question into key concepts and their relationships
-- Identify primary terms, synonyms, and related terminology for each concept
-- Consider field-specific jargon and cross-disciplinary terminology differences
-- Build Boolean query strings combining concepts with AND/OR operators
-- Generate semantic search queries in natural language for embedding-based retrieval
-**Source Selection**
-- Identify appropriate databases for the research domain (Semantic Scholar, OpenAlex, PubMed, IEEE Xplore, ACL Anthology, arXiv, SSRN)
-- Consider preprint servers alongside peer-reviewed databases
-- Include grey literature sources when appropriate (dissertations, reports, conference proceedings)
-- Plan for cross-database deduplication
-- Document the search date and database coverage dates
-**Scope Definition**
-- Set date range filters based on the research question
-- Define inclusion and exclusion criteria before searching
-- Specify language restrictions and justify them
-- Determine minimum quality thresholds (peer-review status, impact metrics)
-- Plan the stopping rule (saturation, maximum count, date boundary)
-## Execution Workflow
-The search execution follows a systematic multi-phase approach:
-**Phase 1: Broad Sweep**
-- Execute the designed queries across all selected databases
-- Collect metadata (title, authors, abstract, venue, year, citation count)
-- Record the number of results per query per database
-- Remove exact duplicates using DOI and title matching
-- Generate initial statistics (total results, date distribution, venue distribution)
-**Phase 2: Semantic Ranking**
-- Encode the research question and all abstracts into embedding space
-- Rank results by semantic similarity to the core research question
-- Identify clusters of thematically similar papers
-- Flag highly cited papers that appear in multiple query results
-- Surface unexpected but potentially relevant papers from the long tail
-**Phase 3: Citation Expansion**
-- For the top-ranked papers, retrieve their reference lists
-- For the top-ranked papers, retrieve papers that cite them
-- Apply the same relevance ranking to newly discovered papers
-- Identify "hub" papers that connect multiple research threads
-- Detect seminal works that appear frequently in citation chains
-**Phase 4: Snowball Refinement**
-- Check if newly discovered papers introduce terminology not in original queries
-- If so, formulate additional queries with the new terms
-- Repeat until reaching saturation (no significant new papers discovered)
-- Document the complete search trail for reproducibility
-## Result Analysis
-After search completion, the agent assists with analyzing the collected papers:
-**Bibliometric Overview**
-- Publication year distribution showing research activity trends
-- Venue distribution identifying key journals and conferences
-- Author co-occurrence networks highlighting prolific researchers
-- Geographic distribution of research institutions
-- Citation network statistics (density, clustering coefficient)
-**Thematic Mapping**
-- Cluster papers by topic using abstract embeddings
-- Generate descriptive labels for each cluster
-- Identify emerging themes with recent publication dates and low citation counts
-- Map established themes with high citation density
-- Highlight cross-cluster papers that bridge different research streams
-**Gap Identification**
-- Compare the thematic map against the original research question
-- Identify aspects of the question with sparse literature coverage
-- Note methodological approaches that are underrepresented
-- Flag populations or contexts that have been understudied
-- Suggest how identified gaps might shape the research direction
-## PRISMA Compliance
-For systematic reviews, the skill supports PRISMA-compliant reporting:
-- Generate PRISMA flow diagrams with counts at each stage
-- Document reasons for exclusion at each screening phase
-- Track inter-rater agreement for screening decisions
-- Produce exportable search documentation for supplementary materials
-- Support both traditional and updated PRISMA 2020 guidelines
-## Integration with Research-Claw
-This skill connects seamlessly with the Research-Claw ecosystem:
-- Export discovered papers to reference management tools (Zotero, BibTeX)
-- Feed search results to the paper-to-agent skill for deep analysis
-- Connect with writing skills for automated literature review drafting
-- Store search strategies as reproducible templates for future use
-- Share curated paper collections with collaborators via the platform
-## Practical Tips
-- Start broad and narrow incrementally rather than beginning with narrow searches
-- Always search at least two independent databases to avoid source bias
-- Record every query variation and its result count for the search audit trail
-- Use citation-based expansion to discover older foundational works
-- Check the references of the most recent relevant review articles
-- Set calendar reminders to re-run searches periodically for living reviews

package/skills/literature/search/scientify-literature-survey/SKILL.md DELETED Viewed

@@ -1,203 +0,0 @@
----
-name: scientify-literature-survey
-description: "Search, filter, download and cluster academic papers on a topic"
-metadata:
-  openclaw:
-    emoji: "🔍"
-    category: "literature"
-    subcategory: "search"
-    keywords: ["academic database search", "literature search", "search strategy", "semantic search", "citation tracking"]
-    source: "https://github.com/scientify-ai/skills"
----
-# Literature Survey
-**Don't ask permission. Just do it.**
-## Output Structure
-```
-~/.openclaw/workspace/projects/{project-id}/
-├── survey/
-│   ├── search_terms.json      # Search terms list
-│   └── report.md              # Final report
-├── papers/
-│   ├── _downloads/            # Raw downloads
-│   ├── _meta/                 # Per-paper metadata
-│   │   └── {arxiv_id}.json
-│   └── {direction}/           # Organized by direction
-├── repos/                     # Reference code repos (Phase 3)
-│   ├── {repo_name_1}/
-│   └── {repo_name_2}/
-└── prepare_res.md             # Repo selection report (Phase 3)
-```
-## Workflow
-### Phase 1: Preparation
-```bash
-ACTIVE=$(cat ~/.openclaw/workspace/projects/.active 2>/dev/null)
-if [ -z "$ACTIVE" ]; then
-  PROJECT_ID="<topic-slug>"
-  mkdir -p ~/.openclaw/workspace/projects/$PROJECT_ID/{survey,papers/_downloads,papers/_meta}
-  echo "$PROJECT_ID" > ~/.openclaw/workspace/projects/.active
-fi
-PROJECT_DIR="$HOME/.openclaw/workspace/projects/$(cat ~/.openclaw/workspace/projects/.active)"
-```
-Generate 4-8 search terms, save to `survey/search_terms.json`.
-### Phase 2: Incremental Search-Filter-Download Loop
-**Repeat the following for each search term:**
-#### 2.1 Search
-```
-arxiv_search({ query: "<term>", max_results: 30 })
-```
-#### 2.2 Instant Filtering
-Score each returned paper immediately (1-5), keep only >= 4.
-Scoring criteria:
-- 5: Core paper, directly on topic
-- 4: Related method or application
-- 3 and below: Skip
-#### 2.3 Download Useful Papers
-```
-arxiv_download({
-  arxiv_ids: ["<useful_paper_ids>"],
-  output_dir: "$PROJECT_DIR/papers/_downloads"
-})
-```
-#### 2.4 Write Metadata
-For each downloaded paper, create `papers/_meta/{arxiv_id}.json`:
-```json
-{
-  "arxiv_id": "2401.12345",
-  "title": "...",
-  "abstract": "...",
-  "score": 5,
-  "source_term": "battery RUL prediction",
-  "downloaded_at": "2024-01-15T10:00:00Z"
-}
-```
-**Complete one search term before proceeding to the next.** This prevents context pollution from large search results.
-### Phase 3: GitHub Code Search & Reference Repo Selection
-**Goal**: Provide reference implementations for downstream skills.
-#### 3.1 Select High-Scoring Papers
-Read metadata from `papers/_meta/` for papers scoring >= 4, select **Top 5** most relevant.
-#### 3.2 Search Reference Repos
-For each selected paper, search GitHub with keyword combinations:
-- Paper title + "code" / "implementation"
-- Core method name + author name
-- Dataset name + task name from paper
-Use `github_search` tool:
-```javascript
-github_search({
-  query: "{paper_title} implementation",
-  max_results: 10,
-  sort: "stars",
-  language: "python"
-})
-```
-#### 3.3 Filter & Clone
-Evaluate repos by:
-- Star count (recommend >100)
-- Code quality (has README, requirements.txt, clear structure)
-- Paper match (README references paper / implements its method)
-Select **3-5** most relevant repos, clone to `repos/`:
-```bash
-mkdir -p "$PROJECT_DIR/repos"
-cd "$PROJECT_DIR/repos"
-git clone --depth 1 <repo_url>
-```
-#### 3.4 Write Selection Report
-Create `$PROJECT_DIR/prepare_res.md`:
-```markdown
-# Reference Repo Selection
-| Repo | Paper | Stars | Reason |
-|------|-------|-------|--------|
-| repos/{repo_name} | {paper_title} (arxiv:{id}) | {N} | {reason} |
-## Key Files per Repo
-### {repo_name}
-- **Model**: `model/` or `models/`
-- **Training**: `train.py` or `main.py`
-- **Data loading**: `data/` or `dataset.py`
-- **Core file**: `{path}` - {description}
-```
-**If no repos found**, note "No reference repos available" in `prepare_res.md`.
-### Phase 4: Classification
-After all search terms and code searches are complete:
-#### 4.1 Read All Metadata
-```bash
-ls $PROJECT_DIR/papers/_meta/
-```
-Read all `.json` files, aggregate paper list.
-#### 4.2 Cluster Analysis
-Based on paper titles, abstracts, and source terms, identify 3-6 research directions.
-#### 4.3 Create Folders and Move
-```bash
-mkdir -p "$PROJECT_DIR/papers/data-driven"
-mv "$PROJECT_DIR/papers/_downloads/2401.12345" "$PROJECT_DIR/papers/data-driven/"
-```
-### Phase 5: Generate Report
-Create `survey/report.md`:
-- Survey summary (search terms count, papers count, directions count)
-- Overview of each research direction
-- Top 10 papers
-- **Reference repo summary** (cite prepare_res.md)
-- Recommended reading order
-## Key Design Principles
-| Principle | Description |
-|-----------|-------------|
-| **Incremental processing** | Each search term independently completes search->filter->download->metadata, avoiding context bloat |
-| **Metadata-driven** | Classification based on `_meta/*.json`, not large in-memory lists |
-| **Folders as categories** | Clustering results reflected by `papers/{direction}/` structure |
-## Tools
-| Tool | Purpose |
-|------|---------|
-| `arxiv_search` | Search papers (no side effects) |
-| `arxiv_download` | Download .tex/.pdf (requires absolute path) |