npm - @wentorai/research-plugins - Versions diffs - 1.0.0 → 1.2.0 - Mend

@wentorai/research-plugins 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (415) hide show

package/skills/research/deep-research/deep-research-work/SKILL.md ADDED Viewed

@@ -0,0 +1,204 @@
+---
+name: deep-research-work
+description: "Combine web search, content analysis, and source verification"
+metadata:
+  openclaw:
+    emoji: "🌐"
+    category: "research"
+    subcategory: "deep-research"
+    keywords: ["web research", "content analysis", "source verification", "information extraction", "research workflow", "web scraping"]
+    source: "https://github.com/AcademicSkills/deep-research-work"
+---
+# Deep Research Work
+A practical deep research workflow that combines web search, structured content analysis, and systematic source verification to produce comprehensive, trustworthy research outputs. Optimized for researchers who need to rapidly synthesize information from diverse online sources while maintaining academic standards of evidence quality.
+## Overview
+Academic research increasingly requires synthesizing information from beyond traditional journal databases: government datasets, technical documentation, industry reports, software repositories, news coverage, and expert commentary. Deep Research Work provides an operational workflow for conducting this kind of multi-source research efficiently while maintaining rigor. It covers search strategy formulation, content extraction, structured note-taking, source credibility assessment, and synthesis into coherent research narratives.
+The workflow is designed to be completed in a single focused session (2-8 hours) and produces a structured research document with full source attribution. It is particularly useful for rapid literature scans, technology landscape assessments, policy research, and interdisciplinary investigations where no single database covers the full scope.
+## Search Strategy Design
+### Query Expansion Technique
+```python
+def generate_search_queries(topic: str, context: dict) -> list:
+    """
+    Generate a comprehensive set of search queries using
+    systematic query expansion.
+    Expansion strategies:
+    1. Synonym expansion: use alternative terminology
+    2. Scope expansion: broaden/narrow the topic
+    3. Perspective expansion: different stakeholder views
+    4. Temporal expansion: historical and forward-looking
+    5. Geographic expansion: regional variations
+    """
+    base_queries = [topic]
+    # Synonym expansion
+    synonyms = context.get('synonyms', [])
+    for syn in synonyms:
+        base_queries.append(syn)
+    # Scope expansion
+    broader = context.get('broader_topic', '')
+    narrower = context.get('sub_topics', [])
+    if broader:
+        base_queries.append(f"{broader} {topic}")
+    for sub in narrower:
+        base_queries.append(f"{topic} {sub}")
+    # Perspective expansion
+    perspectives = ['benefits', 'risks', 'challenges', 'future',
+                    'criticism', 'comparison', 'case study']
+    for p in perspectives:
+        base_queries.append(f"{topic} {p}")
+    # Source-type targeting
+    source_types = ['systematic review', 'meta-analysis', 'white paper',
+                    'technical report', 'dataset', 'open source']
+    for st in source_types:
+        base_queries.append(f"{topic} {st}")
+    return list(set(base_queries))
+```
+### Database Selection Matrix
+| Source Type | Best Databases | When to Use |
+|------------|---------------|-------------|
+| Peer-reviewed articles | Google Scholar, Semantic Scholar, PubMed | Core academic evidence |
+| Preprints | arXiv, bioRxiv, SSRN, medRxiv | Cutting-edge, pre-review findings |
+| Government/institutional | Data.gov, WHO, OECD, national statistics | Official data, policy context |
+| Technical documentation | GitHub, ReadTheDocs, official docs | Software, tools, methods |
+| Industry reports | McKinsey, Gartner, CB Insights | Market context, trends |
+| Patent databases | Google Patents, USPTO, Espacenet | Innovation landscape |
+| News and media | Google News, specialized trade press | Current events, context |
+## Content Extraction and Note-Taking
+### Structured Extraction Template
+For each source reviewed, extract the following:
+```yaml
+source_entry:
+  id: "S001"
+  url: "https://..."
+  title: "Title of the Source"
+  authors: ["Author A", "Author B"]
+  date: "2025-06"
+  type: "journal_article"  # or preprint, report, blog, etc.
+  extraction:
+    main_claim: "One sentence summarizing the key claim or finding"
+    evidence_type: "empirical"  # empirical, theoretical, anecdotal, opinion
+    methodology: "Randomized controlled trial, n=500"
+    key_data_points:
+      - "Finding 1: X increased by 23% (p < 0.01)"
+      - "Finding 2: No significant effect on Y"
+    limitations_noted: "Small sample from single institution"
+    relevant_quotes:
+      - page: 12
+        text: "Our results suggest that..."
+  assessment:
+    credibility: "high"  # high, medium, low
+    relevance: "high"    # high, medium, low
+    novelty: "medium"    # high, medium, low
+    bias_concerns: "Funded by industry; potential conflict of interest"
+```
+### Progressive Summarization
+Apply a layered note-taking approach:
+1. **Layer 1 - Capture**: Save the full source with metadata (URL, date, authors).
+2. **Layer 2 - Bold**: Highlight the most important passages (key findings, methods, conclusions).
+3. **Layer 3 - Highlight**: From the bolded text, mark the essential takeaways for your research question.
+4. **Layer 4 - Summary**: Write a 2-3 sentence summary in your own words.
+5. **Layer 5 - Remix**: Connect the finding to your other sources and your research question.
+## Source Verification Protocol
+### Credibility Assessment Checklist
+For each source, evaluate:
+- [ ] **Authority**: Who is the author/organization? What are their credentials?
+- [ ] **Accuracy**: Are claims supported by evidence? Can you verify the data?
+- [ ] **Currency**: When was it published? Is the information still valid?
+- [ ] **Coverage**: Does it address your question sufficiently?
+- [ ] **Objectivity**: Is there apparent bias? Who funded the work?
+- [ ] **Corroboration**: Do other independent sources support the same claims?
+### Red Flags for Low-Quality Sources
+| Red Flag | Action |
+|----------|--------|
+| No author attribution | Downgrade credibility; seek alternative source |
+| No date published | Treat as potentially outdated |
+| Extraordinary claims without evidence | Require independent corroboration |
+| Known predatory journal | Exclude from primary evidence |
+| Single anonymous blog post | Use only as lead to find primary sources |
+| Circular citations | Trace back to the original source |
+## Synthesis Workflow
+### From Notes to Narrative
+```
+Step 1: Cluster
+  Group extracted notes by theme or sub-question.
+  Use tags from your extraction template.
+Step 2: Compare
+  Within each cluster, compare findings across sources.
+  Note agreements, contradictions, and gaps.
+Step 3: Evaluate
+  Weight evidence by source credibility and recency.
+  Higher-quality sources take precedence when sources conflict.
+Step 4: Narrate
+  Write a synthesis paragraph for each cluster that:
+  - States the overall finding
+  - Cites the supporting sources
+  - Notes any caveats or contradictions
+  - Identifies remaining gaps
+Step 5: Integrate
+  Connect clusters into a coherent narrative.
+  Highlight cross-cutting themes and implications.
+```
+### Output Quality Checklist
+Before finalizing your research output:
+- [ ] Every factual claim has at least one source citation
+- [ ] Contradictory evidence is explicitly acknowledged
+- [ ] Source quality is visible (not all sources treated equally)
+- [ ] Gaps in knowledge are clearly identified
+- [ ] The search methodology is documented for reproducibility
+- [ ] Dates of all searches are recorded
+- [ ] The output answers the original research question
+## Best Practices
+- Set a time limit before starting. Research can expand indefinitely without constraints.
+- Use a reference manager (Zotero, Mendeley) from the start, even for informal research.
+- Save web pages as PDF or archive snapshots (Wayback Machine) to prevent link rot.
+- Distinguish between primary sources (original data/study) and secondary sources (reporting on the study).
+- When a source cites a finding, always try to trace back to the original source.
+- Document negative results: sources searched that did not yield relevant information.
+## References
+- Booth, A., Sutton, A., & Papaioannou, D. (2016). *Systematic Approaches to a Successful Literature Review* (2nd ed.). Sage.
+- Forte, T. (2022). *Building a Second Brain*. Atria Books.
+- Machi, L. A. & McEvoy, B. T. (2016). *The Literature Review* (3rd ed.). Corwin Press.

package/skills/research/deep-research/deep-searcher-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,253 @@
+---
+name: deep-searcher-guide
+description: "Open deep research alternative for private data with vector search"
+metadata:
+  openclaw:
+    emoji: "🔍"
+    category: "research"
+    subcategory: "deep-research"
+    keywords: ["deep-search", "private-data", "milvus", "vector-search", "rag", "document-retrieval"]
+    source: "https://github.com/zilliztech/deep-searcher"
+---
+# Deep Searcher Guide
+## Overview
+Deep Searcher is an open-source deep research tool developed by Zilliz with over 8,000 GitHub stars, designed to be an open alternative to proprietary deep research systems like OpenAI's Deep Research and Gemini Deep Research. What distinguishes Deep Searcher is its focus on private data -- it enables researchers to conduct deep, iterative research over their own document collections, databases, and institutional knowledge bases rather than being limited to public web content.
+The system combines vector search via Milvus (or other vector databases) with agentic RAG (Retrieval-Augmented Generation) to decompose complex research questions, retrieve relevant passages from your document collection, reason over the retrieved content, and iteratively refine its search until it can produce a comprehensive answer. This makes it particularly valuable for researchers who work with proprietary datasets, unpublished manuscripts, internal reports, or specialized domain corpora that are not available through web search.
+Deep Searcher supports multiple LLM providers and embedding models, and can be deployed entirely on-premises for organizations with strict data privacy requirements. It is built on top of Milvus, the high-performance open-source vector database also created by Zilliz, ensuring scalable and efficient similarity search across large document collections.
+## Installation and Setup
+```bash
+# Install Deep Searcher
+pip install deepsearcher
+# Or clone for development
+git clone https://github.com/zilliztech/deep-searcher.git
+cd deep-searcher
+pip install -e .
+```
+### Dependencies Setup
+Deep Searcher requires a vector database and LLM access:
+```bash
+# Option 1: Milvus Lite (embedded, no separate server needed)
+pip install pymilvus[model]
+# Option 2: Full Milvus via Docker
+docker run -d --name milvus \
+  -p 19530:19530 \
+  -p 9091:9091 \
+  milvusio/milvus:latest standalone
+# Configure LLM access
+export OPENAI_API_KEY=$OPENAI_API_KEY
+# Or for local LLMs
+export OLLAMA_BASE_URL=http://localhost:11434
+```
+### Configuration
+Create a configuration file for your research setup:
+```python
+from deepsearcher import DeepSearcher
+from deepsearcher.config import Config
+config = Config(
+    # Vector database settings
+    vector_db="milvus_lite",  # or "milvus", "zilliz_cloud"
+    collection_name="research_papers",
+    # LLM settings
+    llm_provider="openai",
+    llm_model="gpt-4o",
+    # Embedding settings
+    embedding_model="text-embedding-3-small",
+    # Research settings
+    max_iterations=10,
+    chunk_size=1000,
+    chunk_overlap=200,
+)
+searcher = DeepSearcher(config)
+```
+## Document Ingestion
+### Loading Research Documents
+Ingest your research documents into the vector database for searchable access:
+```python
+# Load individual files
+searcher.load_document("path/to/paper.pdf")
+searcher.load_document("path/to/notes.md")
+# Load entire directories
+searcher.load_directory(
+    "path/to/papers/",
+    file_types=["pdf", "md", "txt", "docx"],
+    recursive=True,
+)
+# Load with metadata for filtering
+searcher.load_document(
+    "path/to/paper.pdf",
+    metadata={
+        "author": "Smith et al.",
+        "year": 2024,
+        "topic": "transformer efficiency",
+        "venue": "NeurIPS",
+    }
+)
+```
+### Supported Document Types
+Deep Searcher supports a wide range of document formats commonly used in academic research:
+- **PDF**: Research papers, textbooks, reports (with OCR support for scanned documents)
+- **Markdown**: Research notes, documentation, wikis
+- **Plain text**: Data files, logs, transcripts
+- **DOCX/DOC**: Word documents, manuscripts
+- **HTML**: Web pages, saved articles
+- **LaTeX**: TeX source files with equation extraction
+- **Jupyter Notebooks**: Code and analysis notebooks
+## Deep Research Workflow
+### Basic Research Query
+```python
+# Ask a research question over your document collection
+result = searcher.research(
+    query="What methods have been proposed for reducing the "
+          "computational complexity of self-attention in transformers?",
+)
+print(result.answer)
+print(f"Sources: {len(result.sources)}")
+for source in result.sources:
+    print(f"  - {source.document}: {source.chunk_preview[:100]}...")
+```
+### Iterative Research Process
+Deep Searcher follows an iterative research pipeline:
+1. **Query decomposition**: The research question is broken into sub-queries
+2. **Initial retrieval**: Vector search retrieves relevant passages for each sub-query
+3. **Analysis**: The LLM analyzes retrieved content and identifies information gaps
+4. **Refined search**: New queries are generated to fill gaps, with the search refined based on what has been found
+5. **Synthesis**: All gathered information is synthesized into a comprehensive answer with citations
+```python
+# Watch the iterative research process
+result = searcher.research(
+    query="Compare the approaches to efficient attention in the papers "
+          "I have collected, focusing on trade-offs between speed and quality",
+    verbose=True,  # Print each research iteration
+    max_iterations=8,
+)
+# Access the research trace
+for step in result.trace:
+    print(f"Iteration {step.iteration}:")
+    print(f"  Sub-query: {step.query}")
+    print(f"  Documents found: {step.num_results}")
+    print(f"  Gap identified: {step.gap}")
+```
+### Filtered Research
+Narrow your research to specific subsets of your collection:
+```python
+# Research only within papers from a specific venue
+result = searcher.research(
+    query="Novel loss functions for contrastive learning",
+    filters={"venue": "ICML", "year": {"$gte": 2023}},
+)
+# Research across specific document groups
+result = searcher.research(
+    query="How do the baseline methods compare across my experiment logs?",
+    filters={"topic": "baseline-comparison"},
+)
+```
+## Integration with Research Tools
+### Combining Private and Public Data
+Deep Searcher can be combined with web search for comprehensive research that covers both your private collection and public sources:
+```python
+from deepsearcher.sources import WebSearchSource
+# Add web search as an additional source
+config.add_source(WebSearchSource(
+    provider="tavily",
+    api_key_env="TAVILY_API_KEY",
+))
+# Research now spans both private documents and the web
+result = searcher.research(
+    query="Recent advances in protein folding prediction",
+    sources=["private", "web"],
+)
+```
+### Export and Sharing
+Export research results in formats suitable for academic use:
+```python
+# Export as markdown report
+result.export_markdown("research_report.md")
+# Export citations in BibTeX format
+result.export_citations("references.bib")
+# Export the full research trace for reproducibility
+result.export_trace("research_trace.json")
+```
+### API Server
+Run Deep Searcher as a service for team-wide access:
+```bash
+# Start the API server
+deepsearcher serve --host 0.0.0.0 --port 8000
+# Query via REST API
+curl -X POST http://localhost:8000/research \
+  -H "Content-Type: application/json" \
+  -d '{"query": "What are the key findings in our latest experiments?"}'
+```
+## Performance and Scalability
+Deep Searcher leverages Milvus for high-performance vector search, which means it can handle document collections ranging from hundreds to millions of documents efficiently. Key performance considerations include:
+- **Indexing**: Milvus uses HNSW or IVF indexes for fast approximate nearest neighbor search
+- **Chunking strategy**: Adjustable chunk size and overlap to balance retrieval precision and recall
+- **Embedding caching**: Previously computed embeddings are cached to avoid redundant computation
+- **Batch processing**: Documents can be ingested in parallel for faster indexing
+## References
+- Repository: https://github.com/zilliztech/deep-searcher
+- Milvus vector database: https://milvus.io/
+- Zilliz Cloud (managed Milvus): https://zilliz.com/
+- Milvus documentation: https://milvus.io/docs/

package/skills/research/deep-research/gpt-researcher-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,191 @@
+---
+name: gpt-researcher-guide
+description: "Autonomous agent for comprehensive deep research on any topic"
+metadata:
+  openclaw:
+    emoji: "🔬"
+    category: "research"
+    subcategory: "deep-research"
+    keywords: ["deep-research", "autonomous-agent", "web-search", "report-generation", "literature-review"]
+    source: "https://github.com/assafelovic/gpt-researcher"
+---
+# GPT Researcher Guide
+## Overview
+GPT Researcher is an autonomous research agent with over 26,000 GitHub stars that conducts comprehensive online research on any given topic. Developed by Assaf Elovic, it generates detailed, factual, and unbiased research reports by planning research questions, searching multiple sources, scraping and filtering relevant content, and synthesizing findings into well-structured reports with citations.
+The agent addresses a fundamental challenge in AI-assisted research: generating accurate, comprehensive reports rather than relying on a single LLM's potentially outdated or hallucinated knowledge. GPT Researcher uses a multi-agent architecture where a planner agent decomposes the research query into sub-questions, multiple retriever agents gather information from diverse sources, and a writer agent synthesizes everything into a coherent report.
+For academic researchers, GPT Researcher is valuable for conducting preliminary literature surveys, exploring unfamiliar research domains, gathering background information for grant proposals, and generating initial drafts of review sections. The agent can be configured to search specific domains, use academic search engines, and output reports in various formats including markdown and PDF.
+## Installation and Setup
+```bash
+# Install from PyPI
+pip install gpt-researcher
+# Or clone for development
+git clone https://github.com/assafelovic/gpt-researcher.git
+cd gpt-researcher
+pip install -e .
+```
+Configure your environment with API keys using environment variables:
+```bash
+# Required: LLM provider (choose one)
+export OPENAI_API_KEY=$OPENAI_API_KEY
+# Or use other providers
+export ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
+# Required: Search provider (choose one)
+export TAVILY_API_KEY=$TAVILY_API_KEY
+# Or alternatives
+export SERPER_API_KEY=$SERPER_API_KEY
+export SEARX_URL=$SEARX_URL
+```
+For a fully local setup without external API dependencies, you can configure local LLMs and search engines:
+```bash
+# Use local LLM via Ollama
+export OPENAI_BASE_URL=http://localhost:11434/v1
+export LLM_PROVIDER=ollama
+export FAST_LLM=llama3
+export SMART_LLM=llama3
+# Use local search via SearXNG
+export SEARX_URL=http://localhost:8888
+export SEARCH_PROVIDER=searx
+```
+## Core Research Workflow
+### Basic Research Report
+Generate a research report with a single function call:
+```python
+from gpt_researcher import GPTResearcher
+import asyncio
+async def run_research():
+    query = "Recent advances in protein structure prediction using deep learning"
+    researcher = GPTResearcher(query=query, report_type="research_report")
+    # Conduct research (searches, scrapes, analyzes sources)
+    research_result = await researcher.conduct_research()
+    # Generate the final report
+    report = await researcher.write_report()
+    # Access sources used
+    sources = researcher.get_source_urls()
+    print(f"Report based on {len(sources)} sources")
+    print(report)
+asyncio.run(run_research())
+```
+### Report Types
+GPT Researcher supports multiple report types tailored to different needs:
+- **research_report**: Comprehensive report with findings and analysis (default)
+- **detailed_report**: Extended multi-page report with deeper analysis
+- **resource_report**: Curated list of sources with summaries and relevance scores
+- **outline_report**: Structured outline for further manual research
+- **subtopic_report**: Focused report on a specific subtopic within a broader area
+```python
+# Generate a detailed multi-page report
+researcher = GPTResearcher(
+    query="Transformer architectures for scientific document understanding",
+    report_type="detailed_report",
+    max_subtopics=5,
+)
+```
+### Multi-Agent Architecture
+The research process follows a sophisticated multi-agent pipeline:
+1. **Planner Agent**: Decomposes the research query into 4-6 focused sub-questions
+2. **Retriever Agents**: Each sub-question is researched independently by a dedicated agent that searches, scrapes, and filters content
+3. **Ranker Agent**: Evaluates and ranks gathered sources by relevance and quality
+4. **Writer Agent**: Synthesizes all findings into a coherent, well-structured report with inline citations
+```python
+# Customize the research configuration
+researcher = GPTResearcher(
+    query="Impact of climate change on marine biodiversity",
+    report_type="research_report",
+    source_urls=None,  # Or provide specific URLs to research
+    config_path=None,  # Or path to custom config
+    max_search_results_per_query=5,
+    verbose=True,
+)
+```
+## Advanced Configuration
+### Custom Source Restrictions
+Restrict research to specific domains or provide seed URLs:
+```python
+# Research only from specific academic sources
+researcher = GPTResearcher(
+    query="CRISPR gene editing safety profiles",
+    source_urls=[
+        "https://pubmed.ncbi.nlm.nih.gov/",
+        "https://www.nature.com/",
+        "https://www.science.org/",
+    ],
+)
+```
+### LLM Configuration
+Configure different LLMs for different stages of the research pipeline:
+```python
+# Use a fast model for planning and a powerful model for writing
+# Set via environment variables
+# FAST_LLM: Used for sub-question generation and filtering
+# SMART_LLM: Used for report synthesis and writing
+```
+### Integration with FastAPI
+GPT Researcher includes a web interface and API server:
+```bash
+# Start the web UI and API server
+cd gpt-researcher
+pip install -r requirements.txt
+python -m uvicorn main:app --host 0.0.0.0 --port 8000
+```
+The API exposes WebSocket endpoints for streaming research progress and REST endpoints for report management, making it easy to integrate into existing research platforms.
+## Academic Research Applications
+GPT Researcher can be adapted for several academic use cases:
+- **Preliminary literature surveys**: Quickly scan the landscape of a new research area before conducting a formal systematic review
+- **Grant proposal background**: Gather recent developments and state-of-the-art results to strengthen research proposals
+- **Conference talk preparation**: Generate comprehensive overviews of related work for presentations
+- **Cross-disciplinary exploration**: Investigate adjacent fields to identify potential collaboration opportunities or interdisciplinary approaches
+- **Fact-checking and verification**: Cross-reference claims across multiple sources to validate research findings
+The reports include full citations with URLs, making it straightforward to verify sources and follow up with deeper reading of primary literature.
+## References
+- Repository: https://github.com/assafelovic/gpt-researcher
+- Documentation: https://docs.gptr.dev/
+- Tavily Search API: https://tavily.com/
+- Research paper: https://arxiv.org/abs/2305.04091