npm - @wentorai/research-plugins - Versions diffs - 1.2.3 → 1.3.0 - Mend

@wentorai/research-plugins 1.2.3 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (142) hide show

package/skills/literature/search/arxiv-osiris/SKILL.md DELETED Viewed

@@ -1,199 +0,0 @@
----
-name: arxiv-osiris
-description: "Search and download arXiv papers via Python and PowerShell scripts"
-metadata:
-  openclaw:
-    emoji: "🔍"
-    category: "literature"
-    subcategory: "search"
-    keywords: ["arxiv", "paper download", "preprint search", "python script", "powershell", "literature retrieval"]
-    source: "https://clawhub.com/kostaskyq/arxiv-osiris"
----
-# arXiv Osiris — Paper Search and Download Tool
-## Overview
-arXiv Osiris provides cross-platform scripts (Python and PowerShell) for searching and downloading scientific papers from arXiv.org. It supports keyword search, category filtering, metadata retrieval, and direct PDF download. Useful for researchers who prefer scripted automation over browser-based arXiv access, particularly for building local paper collections.
-## Installation
-```bash
-# Install the arxiv Python client (required dependency)
-pip install arxiv
-# Clone the tool (if using from source)
-git clone https://github.com/kostaskyq/arxiv-osiris.git
-```
-## Usage — Python API
-### Search for Papers
-```python
-import arxiv
-# Basic keyword search
-search = arxiv.Search(
-    query="quantum computing error correction",
-    max_results=10,
-    sort_by=arxiv.SortCriterion.Relevance
-)
-client = arxiv.Client()
-for result in client.results(search):
-    print(f"ID:       {result.entry_id}")
-    print(f"Title:    {result.title}")
-    print(f"Authors:  {', '.join(a.name for a in result.authors)}")
-    print(f"Published:{result.published.strftime('%Y-%m-%d')}")
-    print(f"PDF:      {result.pdf_url}")
-    print(f"Abstract: {result.summary[:200]}...")
-    print()
-```
-### Category-Filtered Search
-```python
-# Search within specific categories
-search = arxiv.Search(
-    query="cat:cs.CL AND transformer",
-    max_results=20,
-    sort_by=arxiv.SortCriterion.SubmittedDate
-)
-# Multiple categories
-search = arxiv.Search(
-    query="(cat:cs.AI OR cat:cs.LG) AND reinforcement learning",
-    max_results=15
-)
-```
-### Download Papers
-```python
-import os
-search = arxiv.Search(query="attention mechanism", max_results=5)
-client = arxiv.Client()
-download_dir = os.path.expanduser("~/papers/attention")
-os.makedirs(download_dir, exist_ok=True)
-for result in client.results(search):
-    # Download PDF
-    result.download_pdf(dirpath=download_dir)
-    print(f"Downloaded: {result.title}")
-    # Download source (LaTeX) if available
-    result.download_source(dirpath=download_dir)
-```
-## Usage — PowerShell Script
-### Search
-```powershell
-# Basic search
-.\arxiv.ps1 -Action search -Query "machine learning"
-# With max results
-.\arxiv.ps1 -Action search -Query "neural networks" -MaxResults 10
-# Filter by category
-.\arxiv.ps1 -Action search -Query "deep learning" -Categories "cs,stat"
-```
-### Download
-```powershell
-# Download by arXiv ID
-.\arxiv.ps1 -Action download -ArxivId "1706.03762"
-# Download to specific directory
-.\arxiv.ps1 -Action download -ArxivId "2301.13688" -OutputDir "C:\Papers"
-```
-## Advanced Queries
-The arXiv API supports a rich query syntax:
-| Operator | Meaning | Example |
-|----------|---------|---------|
-| `AND` | Both terms | `"deep learning" AND "drug discovery"` |
-| `OR` | Either term | `"GAN" OR "diffusion model"` |
-| `ANDNOT` | Exclude term | `"NLP" ANDNOT "translation"` |
-| `au:` | Author | `au:"Hinton"` |
-| `ti:` | Title contains | `ti:"attention"` |
-| `abs:` | Abstract contains | `abs:"protein folding"` |
-| `cat:` | Category | `cat:cs.CV` |
-### Complex Query Examples
-```python
-# Papers by a specific author on a specific topic
-search = arxiv.Search(query='au:"Yann LeCun" AND ti:"self-supervised"')
-# Recent papers in two categories excluding surveys
-search = arxiv.Search(
-    query='(cat:cs.CL OR cat:cs.AI) AND "large language model" ANDNOT ti:"survey"',
-    sort_by=arxiv.SortCriterion.SubmittedDate,
-    max_results=50
-)
-```
-## Building a Local Paper Library
-```python
-import arxiv
-import json
-import os
-from datetime import datetime
-def build_library(queries: dict, base_dir: str = "~/papers"):
-    """Build organized paper library from multiple search queries."""
-    base = os.path.expanduser(base_dir)
-    catalog = []
-    client = arxiv.Client()
-    for topic, query in queries.items():
-        topic_dir = os.path.join(base, topic)
-        os.makedirs(topic_dir, exist_ok=True)
-        search = arxiv.Search(query=query, max_results=20,
-                              sort_by=arxiv.SortCriterion.SubmittedDate)
-        for paper in client.results(search):
-            paper.download_pdf(dirpath=topic_dir)
-            catalog.append({
-                "id": paper.entry_id,
-                "title": paper.title,
-                "authors": [a.name for a in paper.authors],
-                "published": paper.published.isoformat(),
-                "topic": topic,
-                "pdf_path": os.path.join(topic_dir, f"{paper.get_short_id()}.pdf")
-            })
-    # Save catalog
-    with open(os.path.join(base, "catalog.json"), "w") as f:
-        json.dump(catalog, f, indent=2)
-    print(f"Library built: {len(catalog)} papers in {len(queries)} topics")
-# Usage
-build_library({
-    "rag": "cat:cs.CL AND retrieval augmented generation",
-    "agents": "cat:cs.AI AND (LLM agent OR tool use)",
-    "evaluation": "cat:cs.CL AND (benchmark OR evaluation) AND language model"
-})
-```
-## Rate Limits
-- arXiv API: **1 request per 3 seconds** for automated access
-- The `arxiv` Python client handles rate limiting automatically
-- For large-scale downloads, add explicit delays: `time.sleep(3)`
-- Respect [arXiv API Terms of Use](https://info.arxiv.org/help/api/tou.html)
-## References
-- [arxiv Python Client](https://github.com/lukasschwab/arxiv.py)
-- [arXiv API User Manual](https://info.arxiv.org/help/api/user-manual.html)
-- [arXiv Category Taxonomy](https://arxiv.org/category_taxonomy)

package/skills/literature/search/deepgit-search-guide/SKILL.md DELETED Viewed

@@ -1,147 +0,0 @@
----
-name: deepgit-search-guide
-description: "Deep research tool for discovering academic code in Git repositories"
-version: 1.0.0
-author: wentor-community
-source: https://github.com/DeepGit/DeepGit
-metadata:
-  openclaw:
-    category: "literature"
-    subcategory: "search"
-    keywords:
-      - git-search
-      - code-discovery
-      - repository-analysis
-      - research-code
-      - implementation-search
-      - open-source
----
-# DeepGit Search Guide
-A skill for conducting deep searches across Git repositories to discover research implementations, datasets, and academic code artifacts. Based on DeepGit (852 stars), this skill helps researchers find, evaluate, and utilize open-source code associated with academic publications.
-## Overview
-Modern academic research increasingly relies on code for data analysis, model implementation, and experiment reproduction. However, finding the right repository among millions on GitHub requires more than simple keyword search. DeepGit applies deep research techniques to repository discovery, combining semantic code understanding, README analysis, citation linking, and quality assessment to surface the most relevant and reliable research code.
-This skill is essential for researchers who want to build on existing implementations rather than reinventing from scratch, verify published results through code inspection, or find reference implementations of algorithms described in papers.
-## Search Strategies
-**Keyword-Based Search**
-- Start with the paper title, method name, or algorithm as search terms
-- Include the first author's name or institution to narrow results
-- Add framework-specific terms (PyTorch, TensorFlow, scikit-learn) when looking for specific implementations
-- Use language filters to find implementations in your preferred programming language
-- Combine topic tags (machine-learning, deep-learning, nlp, cv) with method-specific terms
-**Paper-Linked Search**
-- Many papers include a "Code available at" link; extract and verify these first
-- Search Papers with Code for repository links associated with specific papers
-- Check the paper's Semantic Scholar or Google Scholar entry for linked code
-- Look for the paper's arXiv abstract which often contains a GitHub link
-- Search for the paper's DOI or arXiv ID in GitHub README files
-**Author-Based Search**
-- Visit the first author's or corresponding author's GitHub profile
-- Check the research group's or lab's GitHub organization page
-- Look for personal academic websites that link to code repositories
-- Search for the author's ORCID or Google Scholar profile for linked repositories
-- Follow the author's collaborators who may have contributed to or forked the code
-**Citation-Chain Search**
-- Find code for papers that cite or are cited by the target paper
-- Implementations of closely related methods often share similar repository structures
-- Forked repositories may contain adaptations for different datasets or settings
-- Look at the "Used by" and "Forks" tabs on GitHub for derivative work
-- Check awesome-lists in the relevant field for curated repository collections
-## Repository Evaluation
-Once candidate repositories are found, evaluate them systematically:
-**Code Quality Indicators**
-- README completeness: clear description, installation instructions, usage examples
-- Documentation: API documentation, tutorials, or walkthroughs
-- Test coverage: presence of test files and CI/CD configuration
-- Code organization: logical directory structure, modular design
-- Dependencies: clear requirements file with pinned versions
-**Reproducibility Assessment**
-- Does the README specify how to reproduce the paper's results?
-- Are pretrained models or checkpoints provided?
-- Is the training data available or are instructions for obtaining it provided?
-- Are random seeds and hardware specifications documented?
-- Do the reported results match the paper's claims?
-**Maintenance Status**
-- Last commit date: recent activity suggests active maintenance
-- Issue response time: how quickly are issues acknowledged and addressed
-- Open issues count: a high ratio of open to closed issues may indicate abandonment
-- Release history: regular releases suggest mature, stable software
-- Contributor count: multiple contributors indicate community involvement
-**Community Signals**
-- Star count: general popularity indicator (but not quality guarantee)
-- Fork count: indicates others are building on the work
-- Citation count of the associated paper
-- Mentions in academic forums, Twitter, or blog posts
-- Inclusion in curated awesome-lists or benchmark suites
-## Working with Research Code
-**Getting Started**
-- Clone the repository and read the entire README before proceeding
-- Check the requirements file and create an isolated environment (conda, venv, Docker)
-- Install dependencies using the exact versions specified
-- Run the provided tests or examples to verify the installation
-- Start with the simplest example before attempting full reproduction
-**Common Challenges**
-- Missing dependencies not listed in requirements
-- Hardcoded paths that need to be adapted to your environment
-- GPU memory requirements exceeding available hardware
-- Dataset preprocessing steps not documented or automated
-- Version conflicts between required packages
-**Adaptation Strategies**
-- Fork the repository before making modifications for your use case
-- Document all changes you make in a changelog or commit messages
-- Keep the original code as a reference branch for comparison
-- Submit bug fixes back to the original repository as pull requests
-- Cite the repository in your publications using its preferred citation format
-## Organizing Discovered Repositories
-**Local Catalog**
-- Maintain a structured record of discovered repositories with metadata
-- Fields: paper title, authors, year, repo URL, stars, language, framework, reproduction status
-- Tag repositories by topic, method, and dataset for cross-referencing
-- Track which repositories you have successfully run and which had issues
-- Note the key configuration settings that made reproduction work
-**Integration with Reference Management**
-- Link repository entries to corresponding Zotero or BibTeX references
-- Use Zotero's URL field to store repository links alongside paper PDFs
-- Tag references with "has-code" or "code-verified" for filtering
-- Include repository URLs in your literature notes
-## Integration with Research-Claw
-This skill enhances the Research-Claw code discovery workflow:
-- Search for implementations after discovering relevant papers through literature skills
-- Feed discovered code to analysis skills for experiment replication
-- Connect with writing skills to properly cite code and data sources
-- Store repository evaluations in the knowledge base for team access
-- Automate periodic checks for new repositories related to ongoing projects
-## Best Practices
-- Always check the repository's license before using code in your own projects
-- Cite both the paper and the repository when using others' code
-- Verify reproduction results before building on top of existing implementations
-- Contribute back improvements, bug fixes, and documentation to the community
-- Keep local copies of critical repositories in case they are deleted or moved
-- Document your environment setup steps so collaborators can replicate your results

package/skills/literature/search/multi-database-literature-search/SKILL.md DELETED Viewed

@@ -1,198 +0,0 @@
----
-name: multi-database-literature-search
-description: "Conduct comprehensive literature searches across multiple academic databases"
-metadata:
-  openclaw:
-    emoji: "🔍"
-    category: "literature"
-    subcategory: "search"
-    keywords: ["multi-database search", "systematic review", "literature search", "academic databases", "cross-database", "search strategy"]
-    source: "https://clawhub.ai/jpjy/literature-search"
----
-# Multi-Database Literature Search
-## Overview
-No single database covers all academic literature. A comprehensive search requires querying multiple databases, each with its own coverage, search syntax, and strengths. This guide provides a structured approach to searching across Google Scholar, PubMed, Semantic Scholar, arXiv, IEEE Xplore, ACM Digital Library, and Scopus/Web of Science, with strategies for deduplication and result management.
-## Database Coverage Map
-| Database | Coverage | Strengths | Free? |
-|----------|----------|-----------|-------|
-| **Google Scholar** | All disciplines, broadest | Grey literature, books, citations | Yes |
-| **Semantic Scholar** | 220M+ papers, all fields | AI-powered relevance, citation context, TLDR | Yes |
-| **PubMed** | Biomedical, life sciences | MeSH terms, clinical trials, 36M+ records | Yes |
-| **arXiv** | Physics, CS, math, econ, stats | Preprints, latest research, open access | Yes |
-| **OpenAlex** | 250M+ works, all fields | Open metadata, citation network, concepts | Yes |
-| **Scopus** | All disciplines | Citation metrics, author profiles | Subscription |
-| **Web of Science** | All disciplines | Impact factors, citation reports | Subscription |
-| **IEEE Xplore** | Engineering, CS | IEEE/IET publications, standards | Partial |
-| **ACM DL** | Computer science | ACM proceedings, computing reviews | Partial |
-| **SSRN** | Social sciences, economics | Working papers, preprints | Yes |
-| **JSTOR** | Humanities, social sciences | Historical archives, journals | Partial |
-## Search Strategy Design
-### Step 1: Decompose Your Question
-```
-Research question:
-  "How does remote work affect employee productivity in knowledge-intensive firms?"
-Concept blocks:
-  Block A: remote work | telework | work from home | telecommuting | hybrid work
-  Block B: productivity | performance | output | efficiency | effectiveness
-  Block C: knowledge work | knowledge-intensive | white collar | professional
-```
-### Step 2: Build Database-Specific Queries
-Each database has different syntax. Translate your concept blocks:
-**Google Scholar**:
-```
-("remote work" OR telework OR "work from home") AND
-(productivity OR performance OR output) AND
-("knowledge work" OR "knowledge-intensive" OR professional)
-```
-**PubMed**:
-```
-("remote work"[Title/Abstract] OR "telework"[Title/Abstract] OR
- "work from home"[Title/Abstract]) AND
-("productivity"[Title/Abstract] OR "performance"[Title/Abstract]) AND
-("knowledge workers"[Title/Abstract] OR "professional"[Title/Abstract])
-```
-**Semantic Scholar API**:
-```bash
-curl "https://api.semanticscholar.org/graph/v1/paper/search?\
-query=remote+work+productivity+knowledge+workers&\
-year=2019-2026&\
-fieldsOfStudy=Economics,Business&\
-limit=100&\
-fields=title,authors,year,abstract,citationCount,url"
-```
-**arXiv**:
-```
-all:"remote work" AND all:productivity AND cat:econ.*
-```
-### Step 3: Execute Searches Systematically
-```markdown
-## Search Log Template (PRISMA-compliant)
-| # | Database | Date | Query String | Filters | Results | Relevant | Notes |
-|---|----------|------|-------------|---------|---------|----------|-------|
-| 1 | Google Scholar | 2026-03-10 | [full query] | 2019-2026 | 1,240 | ~80 | Top 200 screened |
-| 2 | Semantic Scholar | 2026-03-10 | [full query] | Year ≥ 2019 | 487 | ~45 | API, sorted by relevance |
-| 3 | PubMed | 2026-03-10 | [full query] | 5 years | 156 | ~30 | MeSH term: Teleworking |
-| 4 | SSRN | 2026-03-10 | [full query] | — | 89 | ~20 | Working papers |
-| 5 | Scopus | 2026-03-10 | [full query] | 2019-2026 | 312 | ~55 | Most overlap with GS |
-```
-## Deduplication
-After collecting results from multiple databases, remove duplicates:
-```python
-import pandas as pd
-from fuzzywuzzy import fuzz
-def deduplicate_papers(df: pd.DataFrame, title_col: str = "title",
-                        threshold: int = 90) -> pd.DataFrame:
-    """Remove duplicate papers based on fuzzy title matching."""
-    df = df.sort_values("citation_count", ascending=False)
-    keep = []
-    seen_titles = []
-    for _, row in df.iterrows():
-        title = row[title_col].lower().strip()
-        is_dup = False
-        for seen in seen_titles:
-            if fuzz.ratio(title, seen) >= threshold:
-                is_dup = True
-                break
-        if not is_dup:
-            keep.append(row)
-            seen_titles.append(title)
-    result = pd.DataFrame(keep)
-    print(f"Deduplicated: {len(df)} → {len(result)} ({len(df)-len(result)} duplicates removed)")
-    return result
-# Usage
-all_results = pd.concat([gs_results, s2_results, pubmed_results, scopus_results])
-unique = deduplicate_papers(all_results)
-```
-### DOI-Based Deduplication (More Reliable)
-```python
-def deduplicate_by_doi(df: pd.DataFrame) -> pd.DataFrame:
-    """Primary: DOI match. Fallback: fuzzy title match for missing DOIs."""
-    with_doi = df[df["doi"].notna()].drop_duplicates(subset="doi", keep="first")
-    without_doi = df[df["doi"].isna()]
-    without_doi_deduped = deduplicate_papers(without_doi, threshold=85)
-    return pd.concat([with_doi, without_doi_deduped]).reset_index(drop=True)
-```
-## Screening Workflow
-### Title/Abstract Screening
-```markdown
-After deduplication, screen titles and abstracts:
-Include if:
-  □ Directly addresses research question
-  □ Empirical study with data OR systematic review
-  □ Published in peer-reviewed venue OR reputable preprint server
-  □ Written in English or Chinese
-Exclude if:
-  □ Irrelevant population (e.g., manual labor when studying knowledge work)
-  □ No empirical component (pure opinion)
-  □ Duplicate or superseded version
-  □ Cannot access full text (after OA and institutional access attempts)
-```
-### Citation Chaining
-After initial screening, expand coverage:
-```
-Forward citation (who cited this paper?):
-  - Semantic Scholar: "Citations" tab
-  - Google Scholar: "Cited by" link
-  - Web of Science: "Citing Articles"
-Backward citation (what does this paper cite?):
-  - Read the reference list of each key paper
-  - Identify seminal works and foundational papers
-Typically adds 15-30% more relevant papers beyond database searches
-```
-## Recommended Search Order
-For maximum coverage with minimum effort:
-```
-1. Semantic Scholar (broad coverage, AI-powered ranking, free API)
-2. Google Scholar (broadest coverage, catches grey literature)
-3. Domain-specific DB (PubMed for biomedical, arXiv for CS/physics, SSRN for social science)
-4. Scopus or Web of Science (if institutional access available — adds citation metrics)
-5. Citation chaining from top 10 most relevant papers found so far
-6. Grey literature: Google, institutional repositories, conference websites
-```
-## References
-- Moher, D., et al. (2009). "PRISMA Statement." *BMJ*, 339, b2535.
-- Bramer, W. M., et al. (2017). "De-duplication of database search results." *BMC Medical Research Methodology*, 17(1), 1-9.
-- [Semantic Scholar API](https://api.semanticscholar.org/)
-- [PubMed Search Guide](https://pubmed.ncbi.nlm.nih.gov/help/)