npm - @wentorai/research-plugins - Versions diffs - 1.0.0 - Mend

@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (252) hide show

package/skills/literature/fulltext/preprint-servers-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,128 @@
+---
+name: preprint-servers-guide
+description: "Guide to preprint servers across scientific disciplines"
+metadata:
+  openclaw:
+    emoji: "paper"
+    category: "literature"
+    subcategory: "fulltext"
+    keywords: ["preprint server", "preprint submission", "arXiv", "bioRxiv", "open access"]
+    source: "wentor-research-plugins"
+---
+# Preprint Servers Guide
+A comprehensive guide to preprint servers across all major academic disciplines, covering submission workflows, licensing, and programmatic access.
+## What Are Preprints?
+Preprints are complete research manuscripts shared publicly before formal peer review. They enable rapid dissemination of findings, establish priority of discovery, and invite community feedback before journal submission.
+Key characteristics:
+- **Not peer reviewed** (but increasingly moderated for basic quality)
+- **Freely accessible** to anyone
+- **Citable** with a DOI
+- **Versioned** (authors can post revisions)
+- **Compatible** with most journal submissions (check journal policy first)
+## Major Preprint Servers by Discipline
+| Server | Disciplines | Operator | Moderation | DOI Prefix |
+|--------|-------------|----------|------------|------------|
+| arXiv | Physics, Math, CS, Econ, EE, Stats, Q-Bio | Cornell | Light screening | 10.48550 |
+| bioRxiv | Biology (all subfields) | Cold Spring Harbor | Basic screening | 10.1101 |
+| medRxiv | Clinical/health sciences | Yale/BMJ/CSHL | Enhanced screening | 10.1101 |
+| ChemRxiv | Chemistry | ACS | Moderate screening | 10.26434 |
+| EarthArXiv | Earth/planetary sciences | Community-led | Light screening | 10.31223 |
+| PsyArXiv | Psychology | COS/OSF | Light screening | 10.31234 |
+| SocArXiv | Social sciences | COS/OSF | Light screening | 10.31235 |
+| SSRN | Social sciences, law, economics | Elsevier | Minimal | various |
+| engrXiv | Engineering | COS/OSF | Light screening | 10.31224 |
+| EdArXiv | Education | COS/OSF | Light screening | 10.35542 |
+| Preprints.org | Multidisciplinary | MDPI | Basic screening | 10.20944 |
+| Research Square | Multidisciplinary | Springer Nature | In Review service | 10.21203 |
+| TechRxiv | Electrical eng., CS | IEEE | Moderate | 10.36227 |
+## Submission Workflow
+### arXiv Submission
+1. **Create an account** at arxiv.org and get endorsed (new users need endorsement in the relevant category).
+2. **Prepare your manuscript**:
+   - LaTeX source (preferred): upload `.tex` + figures + `.bbl` as a single archive
+   - PDF: accepted but LaTeX is strongly preferred
+3. **Select categories**: Choose a primary category (e.g., `cs.CL`) and optional cross-lists.
+4. **Submit metadata**: Title, abstract, authors, comments, journal-ref (if applicable).
+5. **Wait for processing**: Papers appear in the next daily posting (submissions before 14:00 ET on weekdays).
+6. **Receive arXiv ID**: Format `YYMM.NNNNN` (e.g., `2401.12345`).
+```bash
+# Download arXiv paper PDF programmatically
+curl -o paper.pdf https://arxiv.org/pdf/2401.12345
+# Get metadata via arXiv API
+curl "http://export.arxiv.org/api/query?id_list=2401.12345"
+```
+### bioRxiv/medRxiv Submission
+1. **Create an account** at biorxiv.org or medrxiv.org.
+2. **Upload manuscript** as a single Word or PDF file.
+3. **Add metadata**: Title, authors with ORCIDs, abstract, subject area.
+4. **Select license**: CC-BY, CC-BY-NC, CC-BY-ND, CC-BY-NC-ND, or CC0.
+5. **Screening**: bioRxiv screens for plagiarism, dual submission, and non-scientific content. medRxiv applies additional clinical content screening.
+6. **Posting**: Typically within 24-48 hours after screening.
+```python
+# bioRxiv API: search recent preprints
+import requests
+response = requests.get(
+    "https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31",
+    params={"cursor": 0, "format": "json"}
+)
+papers = response.json()["collection"]
+for p in papers[:5]:
+    print(f"[{p['date']}] {p['title']} (doi: {p['doi']})")
+```
+## Licensing Options
+| License | Allows Reuse | Allows Commercial | Allows Derivatives | Requires Attribution |
+|---------|-------------|-------------------|--------------------|---------------------|
+| CC-BY | Yes | Yes | Yes | Yes |
+| CC-BY-NC | Yes | No | Yes | Yes |
+| CC-BY-ND | Yes | Yes | No | Yes |
+| CC-BY-NC-ND | Yes | No | No | Yes |
+| CC0 | Yes | Yes | Yes | No |
+**Recommendation**: Use CC-BY for maximum openness and compatibility with funder mandates (NIH, Wellcome Trust, ERC). Use CC-BY-NC if you want to restrict commercial reuse.
+## Journal Policies on Preprints
+Most major publishers now accept manuscripts previously posted as preprints:
+- **Accepts preprints**: Nature, Science, PNAS, Cell, Lancet, BMJ, PLOS, eLife, all IEEE journals, most ACM venues
+- **Does not accept preprints**: Some society journals in certain fields (check SHERPA/RoMEO at sherpa.ac.uk/romeo for specific policies)
+Important considerations:
+- Some journals require you to update the preprint with a link to the published version.
+- Dual posting on multiple preprint servers may violate policies.
+- Embargo periods may apply for some clinical journals (especially medRxiv).
+## Programmatic Access Comparison
+| Server | API Available | Bulk Download | OAI-PMH | Rate Limit |
+|--------|--------------|---------------|---------|------------|
+| arXiv | Yes (Atom) | Yes (S3 bulk) | Yes | 1 req/3 sec |
+| bioRxiv | Yes (REST) | No | No | Polite use |
+| SSRN | No public API | No | No | N/A |
+| OSF Preprints | Yes (SHARE) | Yes | Yes | Polite use |
+## Best Practices
+1. **Post before or at submission**: Maximize the time for community feedback.
+2. **Use ORCIDs**: Link your preprint to your ORCID profile for discoverability.
+3. **Update with journal DOI**: After acceptance, add a comment or new version linking to the published article.
+4. **Choose the right server**: Use the discipline-specific server for maximum visibility within your community.
+5. **Check funder requirements**: Some funders (NIH, Plan S) mandate preprint posting for funded research.

package/skills/literature/fulltext/repository-harvesting-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,207 @@
+---
+name: repository-harvesting-guide
+description: "Harvest papers from institutional and subject repositories at scale"
+metadata:
+  openclaw:
+    emoji: "inbox_tray"
+    category: "literature"
+    subcategory: "fulltext"
+    keywords: ["repository harvesting", "OAI-PMH", "institutional repository", "open access", "metadata harvesting", "preprints"]
+    source: "wentor-research-plugins"
+---
+# Repository Harvesting Guide
+A skill for systematically harvesting research papers and metadata from institutional repositories, preprint servers, and open access archives. Covers OAI-PMH protocol, API-based harvesting, and tools for building comprehensive literature datasets.
+## Repository Landscape
+### Types of Repositories
+```
+Institutional Repositories (IR):
+  - Run by universities to archive their researchers' output
+  - Examples: DSpace, EPrints, Fedora-based systems
+  - Discovery: OpenDOAR directory (v2.sherpa.ac.uk/opendoar)
+Subject Repositories:
+  - Discipline-specific archives
+  - arXiv (physics, CS, math), bioRxiv, SSRN, RePEc, EarthArXiv
+Aggregators:
+  - Harvest from many repositories into a single search interface
+  - BASE (Bielefeld Academic Search Engine)
+  - CORE (core.ac.uk, 200M+ open access articles)
+  - OpenAIRE (European research output)
+```
+## OAI-PMH Harvesting
+### Protocol Overview
+The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is the standard protocol for harvesting metadata from repositories. Most institutional repositories support it.
+```python
+import xml.etree.ElementTree as ET
+import urllib.request
+def harvest_oai_records(base_url: str, metadata_prefix: str = "oai_dc",
+                        set_spec: str = None, from_date: str = None) -> list:
+    """
+    Harvest metadata records from an OAI-PMH endpoint.
+    Args:
+        base_url: The OAI-PMH base URL of the repository
+        metadata_prefix: Metadata format (oai_dc, datacite, mets, etc.)
+        set_spec: Optional set to restrict harvesting
+        from_date: Optional date filter (YYYY-MM-DD)
+    """
+    params = f"?verb=ListRecords&metadataPrefix={metadata_prefix}"
+    if set_spec:
+        params += f"&set={set_spec}"
+    if from_date:
+        params += f"&from={from_date}"
+    url = base_url + params
+    records = []
+    while url:
+        response = urllib.request.urlopen(url)
+        tree = ET.parse(response)
+        root = tree.getroot()
+        ns = {"oai": "http://www.openarchives.org/OAI/2.0/"}
+        for record in root.findall(".//oai:record", ns):
+            header = record.find("oai:header", ns)
+            identifier = header.find("oai:identifier", ns).text
+            datestamp = header.find("oai:datestamp", ns).text
+            records.append({
+                "identifier": identifier,
+                "datestamp": datestamp
+            })
+        # Handle resumption token for pagination
+        token_elem = root.find(".//oai:resumptionToken", ns)
+        if token_elem is not None and token_elem.text:
+            url = f"{base_url}?verb=ListRecords&resumptionToken={token_elem.text}"
+        else:
+            url = None
+    return records
+```
+### OAI-PMH Verbs
+```
+Identify          — Get repository information
+ListSets          — List available sets (collections)
+ListMetadataFormats — List supported metadata formats
+ListIdentifiers   — List record headers (lightweight)
+ListRecords        — List full metadata records
+GetRecord          — Retrieve a single record by identifier
+```
+## API-Based Harvesting
+### CORE API
+```python
+import urllib.request
+import json
+import os
+def search_core_api(query: str, limit: int = 100) -> list:
+    """
+    Search CORE.ac.uk for open access papers.
+    Args:
+        query: Search query string
+        limit: Maximum number of results
+    """
+    api_key = os.environ["CORE_API_KEY"]
+    url = "https://api.core.ac.uk/v3/search/works"
+    headers = {"Authorization": f"Bearer {api_key}"}
+    params = f"?q={urllib.parse.quote(query)}&limit={limit}"
+    req = urllib.request.Request(url + params, headers=headers)
+    response = urllib.request.urlopen(req)
+    data = json.loads(response.read())
+    results = []
+    for item in data.get("results", []):
+        results.append({
+            "title": item.get("title"),
+            "doi": item.get("doi"),
+            "year": item.get("yearPublished"),
+            "download_url": item.get("downloadUrl"),
+            "repository": item.get("sourceFulltextUrls", [])
+        })
+    return results
+```
+### Crossref and Unpaywall APIs
+```python
+def find_open_access_version(doi: str) -> dict:
+    """
+    Check Unpaywall for an open access version of a paper.
+    Args:
+        doi: The DOI of the paper
+    """
+    email = os.environ["CONTACT_EMAIL"]
+    url = f"https://api.unpaywall.org/v2/{doi}?email={email}"
+    req = urllib.request.Request(url)
+    response = urllib.request.urlopen(req)
+    data = json.loads(response.read())
+    best_oa = data.get("best_oa_location", {})
+    return {
+        "is_oa": data.get("is_oa", False),
+        "oa_status": data.get("oa_status"),
+        "pdf_url": best_oa.get("url_for_pdf") if best_oa else None,
+        "host_type": best_oa.get("host_type") if best_oa else None
+    }
+```
+## Building a Harvesting Pipeline
+### Workflow for Systematic Collection
+```
+1. Identify target repositories
+   - Use OpenDOAR to find IRs in your research area
+   - List preprint servers relevant to your discipline
+2. Test OAI-PMH endpoints
+   - Send an Identify request to verify the endpoint is active
+   - Check ListMetadataFormats for available schemas
+3. Harvest incrementally
+   - Use the "from" parameter to harvest only new records
+   - Store the last harvest date for each repository
+   - Respect rate limits (typically 1 request per second)
+4. Deduplicate
+   - Match records by DOI when available
+   - Use title + author fuzzy matching for records without DOIs
+   - Flag duplicates rather than deleting (keep provenance)
+5. Store and index
+   - Save metadata in a structured format (JSON, SQLite, or CSV)
+   - Build a local search index for efficient retrieval
+```
+## Ethical and Legal Considerations
+- Always respect robots.txt and rate limits
+- Harvesting metadata is generally permitted; bulk full-text download may require permission
+- Check each repository's terms of use
+- Use harvested data for research purposes, not commercial redistribution
+- Attribute the source repository in any publications using harvested data

package/skills/literature/fulltext/unpaywall-api/SKILL.md ADDED Viewed

@@ -0,0 +1,113 @@
+---
+name: unpaywall-api
+description: "Find free legal full-text versions of scholarly articles via Unpaywall"
+metadata:
+  openclaw:
+    emoji: "🔍"
+    category: "literature"
+    subcategory: "fulltext"
+    keywords: ["full-text retrieval", "open access", "journal copyright policy", "self-archiving"]
+    source: "https://unpaywall.org/products/api"
+    requires:
+      env: ["UNPAYWALL_EMAIL"]
+---
+# Unpaywall API Guide
+## Overview
+Unpaywall is a free, open database of over 40 million free scholarly articles. Built by the nonprofit OurResearch, Unpaywall indexes legal open access (OA) copies of papers from thousands of institutional repositories, preprint servers, publisher websites, and government archives. It is the most comprehensive source for finding freely available versions of paywalled academic literature.
+Researchers, librarians, and tool developers use Unpaywall to locate open access copies of papers they need, assess the OA status of publications, and integrate OA discovery into their workflows. The database is updated daily, scanning repositories and publisher sites for new open access content. Unpaywall categorizes OA into types: gold (published OA), green (repository copy), hybrid (OA in subscription journal), and bronze (free to read on publisher site).
+The API requires only an email address for authentication and is free for non-commercial use with a generous rate limit of 100,000 requests per day.
+## Authentication
+Authentication is via email address passed as a query parameter. No API key or registration is needed -- just provide a valid email:
+```
+?email=your@email.com
+```
+This email is used for contact purposes only and to identify your application. The API will reject requests without a valid email address. For commercial use or higher rate limits, contact OurResearch for an API key.
+## Core Endpoints
+### DOI Lookup: Find Open Access for a Paper
+- **URL**: `GET https://api.unpaywall.org/v2/{doi}`
+- **Parameters**:
+  | Param | Type | Required | Description |
+  |-------|------|----------|-------------|
+  | doi | string | Yes | The DOI of the paper (URL-encoded in path) |
+  | email | string | Yes | Your email address |
+- **Example**:
+  ```bash
+  curl "https://api.unpaywall.org/v2/10.1038/nature12373?email=user@example.com"
+  ```
+- **Response**: JSON with comprehensive OA information:
+  - `is_oa`: boolean indicating if any OA version exists
+  - `best_oa_location`: the best available OA copy with `url`, `url_for_pdf`, `evidence`, `host_type`, `license`, and `version`
+  - `oa_locations`: array of all known OA copies
+  - `oa_status`: gold, green, hybrid, bronze, or closed
+  - `title`, `doi`, `year`, `genre`, `journal_name`, `publisher`
+### Batch DOI Lookup: Multiple Papers
+- **URL**: `GET https://api.unpaywall.org/v2/{doi}` (repeated per DOI)
+- **Parameters**:
+  | Param | Type | Required | Description |
+  |-------|------|----------|-------------|
+  | doi | string | Yes | One DOI per request (batch via multiple requests) |
+  | email | string | Yes | Your email address |
+- **Example**:
+  ```bash
+  # Process multiple DOIs sequentially
+  for doi in "10.1038/nature12373" "10.1126/science.aaa8685" "10.1016/j.cell.2015.05.002"; do
+    curl -s "https://api.unpaywall.org/v2/$doi?email=user@example.com" | jq '{doi: .doi, is_oa: .is_oa, oa_status: .oa_status, best_url: .best_oa_location.url}'
+    sleep 0.01
+  done
+  ```
+- **Response**: Same JSON structure as single lookup, for each DOI.
+### Data Feed: Bulk Access
+For large-scale analyses, Unpaywall provides a complete database snapshot and a weekly data feed, rather than requiring millions of individual API calls. Access is available at https://unpaywall.org/products/data-feed for registered users.
+## Rate Limits
+The API allows 100,000 requests per day (approximately 1.15 requests per second sustained). There is no strict per-second rate limit, so burst traffic is acceptable as long as the daily cap is respected. Exceeding the limit returns HTTP 429. For analyses requiring more than 100K lookups, use the Unpaywall Data Feed (database snapshot) instead.
+## Common Patterns
+### Check OA Status for a Reading List
+Determine which papers in your reading list have freely available versions:
+```bash
+curl -s "https://api.unpaywall.org/v2/10.1038/s41586-021-03819-2?email=user@example.com" | jq '{title: .title, is_oa: .is_oa, oa_status: .oa_status, pdf: .best_oa_location.url_for_pdf}'
+```
+### Find the Best Available PDF
+Get a direct link to the best open access PDF for a paper:
+```bash
+curl -s "https://api.unpaywall.org/v2/10.1145/3292500.3330672?email=user@example.com" | jq '.best_oa_location | {url: .url, pdf: .url_for_pdf, version: .version, license: .license}'
+```
+### Audit Open Access Compliance for a Grant
+Check whether publications from a funded project comply with OA mandates:
+```bash
+# For each publication DOI from the grant
+curl -s "https://api.unpaywall.org/v2/10.1038/nature12373?email=user@example.com" | jq '{doi: .doi, title: .title, is_oa: .is_oa, oa_status: .oa_status, locations: [.oa_locations[] | {host: .host_type, license: .license, version: .version}]}'
+```
+## References
+- Official documentation: https://unpaywall.org/products/api
+- Unpaywall data format: https://unpaywall.org/data-format
+- OurResearch: https://ourresearch.org/

package/skills/literature/metadata/altmetrics-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,132 @@
+---
+name: altmetrics-guide
+description: "Guide to altmetrics and research impact beyond traditional citations"
+metadata:
+  openclaw:
+    emoji: "chart"
+    category: "literature"
+    subcategory: "metadata"
+    keywords: ["altmetrics", "attention score", "social impact", "online mentions", "academic metrics"]
+    source: "wentor-research-plugins"
+---
+# Altmetrics Guide
+Understand and use alternative metrics (altmetrics) to measure the broader impact and reach of research outputs beyond traditional citation counts.
+## What Are Altmetrics?
+Altmetrics capture the online attention and engagement that research receives across diverse platforms. Unlike citation-based metrics (which can take years to accumulate), altmetrics provide near-real-time signals of how research is being discussed, shared, and used.
+| Source Category | Examples | What It Measures |
+|----------------|----------|------------------|
+| Social media | Twitter/X, Facebook, Reddit, Weibo | Public discussion and sharing |
+| News & blogs | Mainstream media, science blogs | Media coverage and science communication |
+| Policy documents | Government reports, clinical guidelines | Policy relevance |
+| Reference managers | Mendeley, Zotero readership | Academic readership and interest |
+| Wikipedia | Article citations | Educational and encyclopedic use |
+| Peer review | Publons, post-publication review | Formal and informal peer evaluation |
+| Patents | Patent citations | Commercial and industrial relevance |
+## Key Altmetric Providers and Scores
+### Altmetric.com Attention Score
+The Altmetric Attention Score is a weighted composite of online mentions:
+| Source | Weight | Rationale |
+|--------|--------|-----------|
+| News outlets | 8 | Editorial curation, wide audience |
+| Blog posts | 5 | Expert commentary |
+| Wikipedia | 3 | Encyclopedic significance |
+| Policy documents | 3 | Real-world impact |
+| Twitter/X posts | 1 | Broad sharing but low barrier |
+| Facebook posts | 0.25 | General public engagement |
+| Reddit posts | 0.25 | Community discussion |
+| Mendeley readers | 0 (separate) | Tracked but not in score |
+### PlumX Metrics (Elsevier)
+PlumX organizes metrics into five categories:
+1. **Usage**: Downloads, views, library holdings
+2. **Captures**: Bookmarks, readers, watchers
+3. **Mentions**: Blog posts, news articles, reviews, Wikipedia
+4. **Social Media**: Tweets, Facebook likes, Reddit activity
+5. **Citations**: Scopus, CrossRef, patent citations
+### Dimensions Badge
+Dimensions provides citation counts alongside altmetric-style attention data, integrating grants, patents, clinical trials, and policy documents.
+## Querying the Altmetric.com API
+```python
+import requests
+# Look up altmetrics by DOI
+doi = "10.1038/s41586-021-03819-2"
+response = requests.get(f"https://api.altmetric.com/v1/doi/{doi}")
+if response.status_code == 200:
+    data = response.json()
+    print(f"Title: {data.get('title')}")
+    print(f"Altmetric Score: {data.get('score')}")
+    print(f"Twitter mentions: {data.get('cited_by_tweeters_count', 0)}")
+    print(f"News mentions: {data.get('cited_by_msm_count', 0)}")
+    print(f"Blog mentions: {data.get('cited_by_feeds_count', 0)}")
+    print(f"Wikipedia mentions: {data.get('cited_by_wikipedia_count', 0)}")
+    print(f"Mendeley readers: {data.get('readers', {}).get('mendeley', 0)}")
+    print(f"Detail URL: {data.get('details_url')}")
+else:
+    print("No altmetric data found for this DOI")
+```
+### Batch Queries
+```python
+# Query multiple DOIs using the Altmetric Explorer API (requires subscription)
+# Free API supports individual lookups by DOI, PubMed ID, or arXiv ID
+# Look up by PubMed ID
+pmid = "34234348"
+response = requests.get(f"https://api.altmetric.com/v1/pmid/{pmid}")
+# Look up by arXiv ID
+arxiv_id = "2103.14030"
+response = requests.get(f"https://api.altmetric.com/v1/arxiv/{arxiv_id}")
+```
+## Interpreting Altmetrics Responsibly
+### What Altmetrics Tell You
+- **Speed of dissemination**: How quickly research is being noticed
+- **Audience breadth**: Whether attention comes from academics, media, public, or policymakers
+- **Geographic reach**: Where in the world the work is being discussed
+- **Interdisciplinary interest**: Engagement from unexpected fields
+### What Altmetrics Do NOT Tell You
+- **Quality**: High attention does not equal high quality (controversial or flawed papers can go viral)
+- **Field-normalized comparison**: Raw scores are not comparable across disciplines
+- **Gaming resistance**: Social media metrics can be artificially inflated
+- **Comprehensive coverage**: Not all platforms and languages are tracked equally
+## Best Practices for Using Altmetrics
+1. **Combine with traditional metrics**: Use altmetrics alongside citation counts, h-index, and peer review to build a complete picture of impact.
+2. **Context matters**: A score of 50 might be exceptional in pure mathematics but ordinary in public health. Check the "Compared to outputs of the same age" percentile.
+3. **Report responsibly**: When including altmetrics in CVs or grant applications, explain what the numbers mean (e.g., "Top 5% of all research outputs tracked by Altmetric.com").
+4. **Track over time**: Set up alerts for your publications to monitor engagement trends.
+5. **Explore the sources**: Click through to see who is discussing your work and in what context. A single policy document mention may be more meaningful than 100 tweets.
+## Tools for Tracking Your Research Impact
+| Tool | Cost | Features |
+|------|------|----------|
+| Altmetric.com Bookmarklet | Free | One-click altmetrics for any paper |
+| ImpactStory / OurResearch | Free | ORCID-based open access and impact profiles |
+| Google Scholar Profile | Free | Citation tracking, h-index, i10-index |
+| PlumX Dashboard | Institutional | Comprehensive multi-source tracking |
+| Dimensions | Free tier | Citations + grants + patents + clinical trials |