npm - @wentorai/research-plugins - Versions diffs - 1.0.0 - Mend

@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (252) hide show

package/skills/literature/search/semantic-scholar-api/SKILL.md ADDED Viewed

@@ -0,0 +1,134 @@
+---
+name: semantic-scholar-api
+description: "Search papers and analyze citation graphs via Semantic Scholar"
+metadata:
+  openclaw:
+    emoji: "🔍"
+    category: "literature"
+    subcategory: "search"
+    keywords: ["academic database search", "semantic search", "AI-powered literature search", "citation analysis", "citation network"]
+    source: "https://api.semanticscholar.org/"
+---
+# Semantic Scholar API Guide
+## Overview
+Semantic Scholar is a free, AI-powered research tool created by the Allen Institute for AI (AI2) that indexes over 200 million academic papers across all fields of science. Unlike traditional keyword-based search engines, Semantic Scholar uses natural language processing and machine learning to understand paper content, identify influential citations, and surface the most relevant results.
+The Semantic Scholar Academic Graph API provides structured access to papers, authors, citations, and references. It distinguishes between influential and non-influential citations using a trained classifier, helping researchers quickly identify the most impactful works in any field. The API also provides TLDR summaries generated by AI for many papers.
+The API can be used without authentication for basic access. Registering for a free API key unlocks higher rate limits and is recommended for production applications. The API returns clean JSON responses and supports field selection to minimize response payload size.
+## Authentication
+No authentication is required for basic usage. For higher rate limits, request a free API key at https://www.semanticscholar.org/product/api and include it as a header:
+```
+x-api-key: YOUR_API_KEY
+```
+Without an API key, rate limits are 5,000 requests per 5 minutes. With a key, limits are significantly higher (up to 1 request per second sustained).
+## Core Endpoints
+### Paper Search: Find Papers by Query
+- **URL**: `GET https://api.semanticscholar.org/graph/v1/paper/search`
+- **Parameters**:
+  | Param | Type | Required | Description |
+  |-------|------|----------|-------------|
+  | query | string | Yes | Search query string |
+  | offset | integer | No | Pagination offset (default: 0) |
+  | limit | integer | No | Results per page (default: 10, max: 100) |
+  | fields | string | No | Comma-separated fields to return (e.g., title,abstract,year,citationCount) |
+  | year | string | No | Year range filter (e.g., 2020-2024 or 2024-) |
+  | fieldsOfStudy | string | No | Filter by field (e.g., Computer Science, Medicine) |
+- **Example**:
+  ```bash
+  curl "https://api.semanticscholar.org/graph/v1/paper/search?query=attention+is+all+you+need&limit=5&fields=title,year,citationCount,authors,tldr"
+  ```
+- **Response**: JSON with `total`, `offset`, and `data` array containing paper objects with requested fields.
+### Paper Details: Retrieve Full Paper Metadata
+- **URL**: `GET https://api.semanticscholar.org/graph/v1/paper/{paper_id}`
+- **Parameters**:
+  | Param | Type | Required | Description |
+  |-------|------|----------|-------------|
+  | paper_id | string | Yes | Semantic Scholar ID, DOI, ArXiv ID, or other identifier (e.g., DOI:10.1234/...) |
+  | fields | string | No | Comma-separated fields to return |
+- **Example**:
+  ```bash
+  curl "https://api.semanticscholar.org/graph/v1/paper/DOI:10.18653/v1/N19-1423?fields=title,abstract,year,citationCount,influentialCitationCount,references,citations"
+  ```
+- **Response**: JSON with full paper metadata including `paperId`, `title`, `abstract`, `year`, `citationCount`, `influentialCitationCount`, `references`, and `citations`.
+### Author Search: Find Researchers
+- **URL**: `GET https://api.semanticscholar.org/graph/v1/author/search`
+- **Parameters**:
+  | Param | Type | Required | Description |
+  |-------|------|----------|-------------|
+  | query | string | Yes | Author name query |
+  | offset | integer | No | Pagination offset |
+  | limit | integer | No | Results per page (max: 1000) |
+  | fields | string | No | Fields to return (e.g., name,paperCount,citationCount,hIndex) |
+- **Example**:
+  ```bash
+  curl "https://api.semanticscholar.org/graph/v1/author/search?query=Yoshua+Bengio&fields=name,paperCount,citationCount,hIndex"
+  ```
+- **Response**: JSON with author profiles including publication and citation metrics.
+### Dataset Releases: Bulk Data Access
+- **URL**: `GET https://api.semanticscholar.org/datasets/v1/release`
+- **Parameters**:
+  | Param | Type | Required | Description |
+  |-------|------|----------|-------------|
+  | (none) | - | - | Returns list of available dataset releases |
+- **Example**:
+  ```bash
+  curl "https://api.semanticscholar.org/datasets/v1/release"
+  ```
+- **Response**: JSON array of release identifiers (dates) for bulk dataset downloads.
+## Rate Limits
+Without API key: 5,000 requests per 5 minutes (approximately 16.7 requests per second in bursts). With API key: higher sustained throughput, varies by key tier. The API returns HTTP 429 when limits are exceeded. Use the `Retry-After` header value to determine wait time before retrying. Batch endpoints are available for retrieving multiple papers or authors in a single request, which is more efficient than individual lookups.
+## Common Patterns
+### Build a Citation Network
+Retrieve a paper and its citation tree to map influence:
+```bash
+# Get paper with its references and citations
+curl "https://api.semanticscholar.org/graph/v1/paper/CorpusID:49313245?fields=title,citations.title,citations.citationCount,references.title,references.citationCount"
+```
+### Find Influential Papers on a Topic
+Search for highly cited and influential works:
+```bash
+curl "https://api.semanticscholar.org/graph/v1/paper/search?query=graph+neural+networks&fields=title,year,citationCount,influentialCitationCount&limit=20"
+```
+### Batch Paper Lookup
+Retrieve metadata for multiple papers in a single request using the batch endpoint:
+```bash
+curl -X POST "https://api.semanticscholar.org/graph/v1/paper/batch" \
+  -H "Content-Type: application/json" \
+  -d '{"ids": ["DOI:10.1038/s41586-021-03819-2", "CorpusID:49313245"]}' \
+  --url-query "fields=title,year,citationCount"
+```
+## References
+- Official documentation: https://api.semanticscholar.org/
+- API tutorial: https://www.semanticscholar.org/product/api/tutorial
+- Semantic Scholar paper: https://arxiv.org/abs/2301.10140

package/skills/literature/search/systematic-search-strategy/SKILL.md ADDED Viewed

@@ -0,0 +1,214 @@
+---
+name: systematic-search-strategy
+description: "Construct rigorous systematic search strategies for literature reviews"
+metadata:
+  openclaw:
+    emoji: "dart"
+    category: "literature"
+    subcategory: "search"
+    keywords: ["search strategy", "Boolean search", "search string construction", "advanced search", "systematic review"]
+    source: "wentor"
+---
+# Systematic Search Strategy
+A skill for designing and executing comprehensive, reproducible literature search strategies for systematic reviews, scoping reviews, and meta-analyses. Follows PRISMA 2020 guidelines and Cochrane Handbook best practices.
+## PICO Framework for Search Design
+Structure your research question using PICO (or variants):
+```
+P - Population / Problem:   Who or what is being studied?
+I - Intervention / Exposure: What is the treatment or exposure?
+C - Comparison:              What is the alternative?
+O - Outcome:                 What is being measured?
+Variants:
+PICOS: adds Study design
+SPIDER: Sample, Phenomenon of Interest, Design, Evaluation, Research type
+PCC:    Population, Concept, Context (for scoping reviews)
+```
+### From PICO to Search Strategy
+```python
+def pico_to_search_blocks(pico: dict) -> dict:
+    """
+    Convert a PICO question into search concept blocks.
+    Args:
+        pico: Dict with keys 'population', 'intervention', 'comparison', 'outcome'
+              Each value is a list of synonyms/related terms
+    Returns:
+        Search blocks ready for Boolean combination
+    """
+    blocks = {}
+    for component, terms in pico.items():
+        # Expand each term with common variants
+        expanded = []
+        for term in terms:
+            expanded.append(f'"{term}"')
+            # Add truncation variants
+            if len(term) > 5:
+                expanded.append(f'{term.rstrip("s")}*')  # basic stemming
+        blocks[component] = expanded
+    # Build final query: AND between blocks, OR within blocks
+    query_parts = []
+    for component, terms in blocks.items():
+        block = ' OR '.join(terms)
+        query_parts.append(f'({block})')
+    final_query = ' AND '.join(query_parts)
+    return {
+        'blocks': blocks,
+        'combined_query': final_query,
+        'n_concepts': len(blocks)
+    }
+# Example: RQ: "Does mindfulness meditation reduce anxiety in college students?"
+pico = {
+    'population': ['college students', 'university students', 'undergraduate students',
+                    'higher education students'],
+    'intervention': ['mindfulness', 'mindfulness meditation', 'mindfulness-based stress reduction',
+                     'MBSR', 'mindfulness-based cognitive therapy', 'MBCT'],
+    'outcome': ['anxiety', 'anxiety disorder', 'generalized anxiety', 'test anxiety',
+                'anxiety symptoms', 'state anxiety', 'trait anxiety']
+}
+result = pico_to_search_blocks(pico)
+print(result['combined_query'])
+```
+## Database-Specific Search Syntax
+### Adapting Searches Across Databases
+```python
+def adapt_search_for_database(base_query: str, database: str) -> str:
+    """
+    Adapt a base search string for different database syntaxes.
+    """
+    adaptations = {
+        'pubmed': {
+            'truncation': '*',
+            'phrase': '"..."',
+            'proximity': None,  # PubMed doesn't support proximity
+            'field_tags': {'title': '[ti]', 'abstract': '[tiab]', 'mesh': '[MeSH]'},
+            'notes': 'Add MeSH terms for each concept block'
+        },
+        'web_of_science': {
+            'truncation': '*',
+            'phrase': '"..."',
+            'proximity': 'NEAR/N',
+            'field_tags': {'title': 'TI=', 'topic': 'TS=', 'author': 'AU='},
+            'notes': 'Use TS= for topic search (title+abstract+keywords)'
+        },
+        'scopus': {
+            'truncation': '*',
+            'phrase': '"..."',
+            'proximity': 'W/N',
+            'field_tags': {'title': 'TITLE()', 'title_abs': 'TITLE-ABS-KEY()', 'author': 'AUTH()'},
+            'notes': 'Use TITLE-ABS-KEY() for comprehensive searching'
+        },
+        'psycinfo': {
+            'truncation': '*',
+            'phrase': '"..."',
+            'proximity': 'Nn',
+            'field_tags': {'title': 'TI', 'abstract': 'AB', 'thesaurus': 'DE'},
+            'notes': 'Use DE field for PsycINFO thesaurus terms'
+        }
+    }
+    db = adaptations.get(database.lower(), {})
+    adapted = base_query  # Start with base query
+    return {
+        'database': database,
+        'query': adapted,
+        'syntax_notes': db.get('notes', ''),
+        'truncation': db.get('truncation', '*'),
+        'field_tags': db.get('field_tags', {})
+    }
+```
+## Search Documentation
+### PRISMA-S Reporting Checklist
+Document every search completely:
+```yaml
+search_documentation:
+  date_searched: "2026-03-09"
+  databases:
+    - name: "PubMed/MEDLINE"
+      interface: "PubMed.gov"
+      date_coverage: "1966-present"
+      search_string: |
+        (("college students"[tiab] OR "university students"[tiab])
+        AND ("mindfulness"[tiab] OR "MBSR"[tiab])
+        AND ("anxiety"[tiab] OR "anxiety disorders"[MeSH]))
+      results_count: 342
+      filters_applied: "English language; 2010-2026"
+    - name: "Web of Science"
+      interface: "Clarivate"
+      date_coverage: "1900-present"
+      search_string: |
+        TS=("college student*" OR "university student*")
+        AND TS=(mindfulness OR MBSR OR MBCT)
+        AND TS=(anxiety)
+      results_count: 287
+      filters_applied: "Article or Review; English; 2010-2026"
+  grey_literature:
+    - "ProQuest Dissertations (N=45)"
+    - "Google Scholar first 200 results"
+    - "OpenGrey (N=12)"
+    - "Hand-searched reference lists of included studies"
+  total_before_dedup: 686
+  total_after_dedup: 493
+  deduplication_tool: "Covidence"
+```
+## Screening Workflow
+### PRISMA Flow Diagram Data
+```python
+def prisma_flow(records: dict) -> str:
+    """Generate PRISMA 2020 flow diagram data."""
+    flow = f"""
+    IDENTIFICATION
+      Records from databases: {records['from_databases']}
+      Records from other sources: {records['from_other']}
+      Duplicates removed: {records['duplicates']}
+      Records after dedup: {records['from_databases'] + records['from_other'] - records['duplicates']}
+    SCREENING
+      Title/abstract screened: {records['screened']}
+      Excluded at title/abstract: {records['excluded_screening']}
+      Full-text assessed: {records['fulltext_assessed']}
+      Excluded at full-text: {records['excluded_fulltext']}
+        Reasons: {records.get('exclusion_reasons', 'See table')}
+    INCLUDED
+      Studies in qualitative synthesis: {records['included_qualitative']}
+      Studies in meta-analysis: {records.get('included_meta', 'N/A')}
+    """
+    return flow
+```
+## Iterating and Refining
+After initial search execution:
+1. Check sensitivity: Are known relevant papers (seed papers) captured?
+2. Check precision: What proportion of results are relevant? (Target >5% for systematic reviews)
+3. If too many results: Add specificity with additional concept blocks or filters
+4. If too few results: Broaden terms, add synonyms, remove restrictive blocks
+5. Consult a research librarian for complex searches -- they are expert search strategists
+Document every modification to the search strategy with rationale to maintain transparency and reproducibility.

package/skills/research/automation/ai-scientist-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,228 @@
+---
+name: ai-scientist-guide
+description: "End-to-end automated scientific discovery with AI Scientist v2"
+metadata:
+  openclaw:
+    emoji: "🤖"
+    category: "research"
+    subcategory: "automation"
+    keywords: ["ai-scientist", "research automation", "scientific workflow", "AI experiment design"]
+    source: "https://github.com/SakanaAI/AI-Scientist-v2"
+---
+# AI Scientist Guide
+## Overview
+The AI Scientist v2 is a fully autonomous scientific research system developed by Sakana AI that can generate hypotheses, run experiments, analyze data, and write complete scientific manuscripts. It represents the cutting edge of AI-driven research automation, having produced the first workshop paper written entirely by AI and accepted through peer review at ICLR 2025.
+Unlike its predecessor (v1), the AI Scientist v2 removes reliance on human-authored templates, generalizes across Machine Learning domains, and employs a progressive agentic tree search guided by an experiment manager agent. This guide explains how to set up, configure, and use the system effectively, as well as how to integrate its principles into your own research workflows.
+This skill is relevant for researchers interested in accelerating their experimental cycles, exploring automated hypothesis generation, or understanding how agentic AI systems approach scientific discovery. Even if you do not use AI Scientist v2 directly, the concepts behind its design -- structured ideation, tree-based experiment exploration, automated writing -- can inform how you organize your own research.
+## System Architecture
+The AI Scientist v2 operates through a multi-stage pipeline:
+```
+Topic Description (.md)
+    |
+    v
+[Ideation Stage] --> Research Ideas (.json)
+    |
+    v
+[Experiment Stage] --> Best-First Tree Search (BFTS)
+    |                   - Multiple parallel workers
+    |                   - Automatic debugging
+    |                   - Experiment manager agent
+    v
+[Analysis Stage] --> Results + Figures
+    |
+    v
+[Writing Stage] --> Complete Paper (.pdf)
+    |
+    v
+[Review Stage] --> Automated Peer Review
+```
+### Key Components
+| Component | Role | Model Used |
+|-----------|------|------------|
+| Ideation Agent | Generates research hypotheses | Configurable (GPT-4o, Claude) |
+| Experiment Manager | Guides tree search exploration | Claude 3.5 Sonnet (default) |
+| Analysis Agent | Interprets results, creates figures | Same as experiment |
+| Writing Agent | Drafts full paper with LaTeX | o1-preview (default) |
+| Citation Agent | Finds and integrates references | GPT-4o (default) |
+| Review Agent | Simulates peer review | GPT-4o (default) |
+## Installation and Setup
+### Prerequisites
+- Linux with NVIDIA GPU (CUDA support required)
+- Python 3.11+
+- conda or mamba package manager
+### Step-by-Step Installation
+```bash
+# 1. Create and activate environment
+conda create -n ai_scientist python=3.11
+conda activate ai_scientist
+# 2. Install PyTorch with CUDA
+conda install pytorch torchvision torchaudio pytorch-cuda=12.4 \
+  -c pytorch -c nvidia
+# 3. Install PDF and LaTeX tools
+conda install anaconda::poppler conda-forge::chktex
+# 4. Clone and install
+git clone https://github.com/SakanaAI/AI-Scientist-v2.git
+cd AI-Scientist-v2
+pip install -r requirements.txt
+# 5. Set API keys
+export OPENAI_API_KEY=<key>
+export S2_API_KEY=<key>  # Optional but recommended
+```
+## Running the Pipeline
+### Stage 1: Ideation
+Create a topic description file following this structure:
+```markdown
+# Title
+Exploring Efficient Fine-Tuning Methods for Large Language Models
+# Keywords
+LoRA, parameter-efficient fine-tuning, LLM adaptation, low-rank
+# TL;DR
+Investigate novel parameter-efficient methods for adapting LLMs to
+domain-specific tasks with minimal compute.
+# Abstract
+Large language models require substantial resources for full fine-tuning.
+Parameter-efficient methods like LoRA reduce this cost but may sacrifice
+performance. We seek to explore new approaches that balance efficiency
+and effectiveness across diverse downstream tasks.
+```
+Run ideation:
+```bash
+python ai_scientist/perform_ideation_temp_free.py \
+  --workshop-file "ai_scientist/ideas/my_topic.md" \
+  --model gpt-4o-2024-05-13 \
+  --max-num-generations 20 \
+  --num-reflections 5
+```
+This produces a JSON file with structured research ideas including hypotheses, proposed experiments, and related work.
+### Stage 2: Experiment and Paper Generation
+```bash
+python launch_scientist_bfts.py \
+  --load_ideas "ai_scientist/ideas/my_topic.json" \
+  --load_code \
+  --add_dataset_ref \
+  --model_writeup o1-preview-2024-09-12 \
+  --model_citation gpt-4o-2024-11-20 \
+  --model_review gpt-4o-2024-11-20 \
+  --model_agg_plots o3-mini-2025-01-31 \
+  --num_cite_rounds 20
+```
+### Configuration: bfts_config.yaml
+Key parameters to tune:
+```yaml
+agent:
+  num_workers: 3        # Parallel exploration paths
+  steps: 21             # Maximum nodes to explore
+  num_seeds: 3          # Initial root nodes
+search:
+  max_debug_depth: 3    # Max debug attempts per failing node
+  debug_prob: 0.5       # Probability of debugging vs. abandoning
+  num_drafts: 3         # Number of independent search trees
+```
+## Cost and Performance Estimates
+| Phase | Typical Cost | Duration |
+|-------|-------------|----------|
+| Ideation (20 ideas) | $2-5 | 15-30 min |
+| Experimentation (BFTS) | $15-20 | 2-6 hours |
+| Writing + Citation | $5 | 20-30 min |
+| Review | $1-2 | 5-10 min |
+| **Total per run** | **$23-32** | **3-7 hours** |
+## Integrating AI Scientist Principles Into Your Research
+Even without running the full system, you can adopt its methodological ideas:
+### Structured Ideation
+Use LLMs to brainstorm research directions systematically:
+```python
+prompt = """
+Given the research area of [TOPIC], generate 5 research ideas.
+For each idea, provide:
+1. Hypothesis (one sentence)
+2. Key experiment to test it
+3. Expected outcome if hypothesis is true
+4. Expected outcome if hypothesis is false
+5. Why this matters (impact)
+"""
+```
+### Tree-Based Experiment Design
+Instead of running experiments linearly, structure them as a tree:
+1. Start with 2-3 seed experiments (broad exploration)
+2. Evaluate results at each node
+3. Expand the most promising branches
+4. Prune branches that show diminishing returns
+5. Debug failures before abandoning (up to a depth limit)
+### Automated Literature Checks
+Use Semantic Scholar API to check novelty before investing in an idea:
+```python
+import requests
+def check_novelty(query, max_results=10):
+    url = "https://api.semanticscholar.org/graph/v1/paper/search"
+    params = {"query": query, "limit": max_results,
+              "fields": "title,year,citationCount"}
+    resp = requests.get(url, params=params)
+    papers = resp.json().get('data', [])
+    return papers
+```
+## Best Practices
+- **Always disclose AI involvement.** If AI Scientist generates any part of your paper, disclose this clearly in the methods section.
+- **Validate all generated results.** Automated systems can produce plausible but incorrect code. Review experiments manually.
+- **Use sandboxed environments.** The system executes LLM-generated code. Run it in Docker containers.
+- **Start with well-defined topics.** Narrow, concrete research questions produce better results than broad ones.
+- **Iterate on the topic description.** The quality of the input topic file strongly influences output quality.
+- **Combine with human judgment.** Use AI Scientist for ideation and draft generation, but apply human expertise for final decisions.
+## References
+- [AI-Scientist-v2 Repository](https://github.com/SakanaAI/AI-Scientist-v2) -- Source code (2,229+ stars)
+- [AI Scientist v2 Paper](https://pub.sakana.ai/ai-scientist-v2/paper) -- Workshop-Level Automated Scientific Discovery via Agentic Tree Search
+- [AI Scientist Blog Post](https://sakana.ai/ai-scientist-first-publication/) -- Sakana AI announcement
+- [AIDE: ML Engineering Agent](https://github.com/WecoAI/aideml) -- Foundation for the tree search component
+- [Semantic Scholar API](https://api.semanticscholar.org/) -- Literature search API