npm - @wentorai/research-plugins - Versions diffs - 1.0.0 - Mend

@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (252) hide show

package/skills/domains/ai-ml/transformer-architecture-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,233 @@
+---
+name: transformer-architecture-guide
+description: "Guide to Transformer architectures for NLP and computer vision"
+metadata:
+  openclaw:
+    emoji: "brain"
+    category: "domains"
+    subcategory: "ai-ml"
+    keywords: ["Transformer", "neural network", "deep learning", "NLP", "computer vision"]
+    source: "wentor-research-plugins"
+---
+# Transformer Architecture Guide
+Understand, implement, and adapt Transformer architectures for NLP, computer vision, and multimodal research, from the original attention mechanism to modern variants.
+## The Original Transformer
+The Transformer (Vaswani et al., 2017, "Attention Is All You Need") replaced recurrence and convolution with self-attention as the primary sequence modeling mechanism.
+### Core Components
+| Component | Function | Key Parameters |
+|-----------|----------|---------------|
+| Multi-Head Self-Attention | Computes attention weights across all positions | d_model, n_heads, d_k, d_v |
+| Feed-Forward Network | Position-wise nonlinear transformation | d_model, d_ff |
+| Positional Encoding | Injects sequence order information | Sinusoidal or learned |
+| Layer Normalization | Stabilizes training | Pre-norm or post-norm |
+| Residual Connections | Enables gradient flow in deep networks | Add before or after norm |
+### Self-Attention Mechanism
+```python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+class MultiHeadAttention(nn.Module):
+    def __init__(self, d_model=512, n_heads=8):
+        super().__init__()
+        self.d_model = d_model
+        self.n_heads = n_heads
+        self.d_k = d_model // n_heads
+        self.W_q = nn.Linear(d_model, d_model)
+        self.W_k = nn.Linear(d_model, d_model)
+        self.W_v = nn.Linear(d_model, d_model)
+        self.W_o = nn.Linear(d_model, d_model)
+    def forward(self, Q, K, V, mask=None):
+        batch_size = Q.size(0)
+        # Linear projections and reshape for multi-head
+        Q = self.W_q(Q).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
+        K = self.W_k(K).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
+        V = self.W_v(V).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
+        # Scaled dot-product attention
+        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
+        if mask is not None:
+            scores = scores.masked_fill(mask == 0, -1e9)
+        attn_weights = F.softmax(scores, dim=-1)
+        context = torch.matmul(attn_weights, V)
+        # Concatenate heads and project
+        context = context.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model)
+        return self.W_o(context)
+```
+### Complete Transformer Block
+```python
+class TransformerBlock(nn.Module):
+    def __init__(self, d_model=512, n_heads=8, d_ff=2048, dropout=0.1):
+        super().__init__()
+        self.attention = MultiHeadAttention(d_model, n_heads)
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+        self.ffn = nn.Sequential(
+            nn.Linear(d_model, d_ff),
+            nn.GELU(),
+            nn.Dropout(dropout),
+            nn.Linear(d_ff, d_model),
+            nn.Dropout(dropout)
+        )
+        self.dropout = nn.Dropout(dropout)
+    def forward(self, x, mask=None):
+        # Pre-norm architecture (GPT-style)
+        attn_out = self.attention(self.norm1(x), self.norm1(x), self.norm1(x), mask)
+        x = x + self.dropout(attn_out)
+        ffn_out = self.ffn(self.norm2(x))
+        x = x + ffn_out
+        return x
+```
+## Major Transformer Variants
+### Architecture Taxonomy
+| Architecture | Type | Key Innovation | Representative Model |
+|-------------|------|---------------|---------------------|
+| Encoder-only | Bidirectional | Masked language modeling | BERT, RoBERTa |
+| Decoder-only | Autoregressive | Causal language modeling | GPT, LLaMA, Claude |
+| Encoder-Decoder | Seq2seq | Cross-attention between encoder and decoder | T5, BART, mBART |
+### Encoder-Only (BERT Family)
+```python
+# BERT-style masked language modeling
+from transformers import BertTokenizer, BertForMaskedLM
+tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+model = BertForMaskedLM.from_pretrained("bert-base-uncased")
+text = "The Transformer architecture has [MASK] natural language processing."
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model(**inputs)
+# Get predictions for [MASK]
+mask_idx = (inputs.input_ids == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]
+logits = outputs.logits[0, mask_idx]
+top_tokens = logits.topk(5).indices[0]
+print([tokenizer.decode(t) for t in top_tokens])
+```
+### Decoder-Only (GPT Family)
+```python
+# GPT-style autoregressive generation
+from transformers import GPT2LMHeadModel, GPT2Tokenizer
+tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
+model = GPT2LMHeadModel.from_pretrained("gpt2")
+prompt = "The key innovation of the Transformer is"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=50,
+    temperature=0.7,
+    top_p=0.9,
+    do_sample=True
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Vision Transformers (ViT)
+The Vision Transformer (Dosovitskiy et al., 2021) applies the Transformer to image classification:
+```python
+class VisionTransformer(nn.Module):
+    def __init__(self, img_size=224, patch_size=16, in_channels=3,
+                 d_model=768, n_heads=12, n_layers=12, n_classes=1000):
+        super().__init__()
+        self.patch_size = patch_size
+        n_patches = (img_size // patch_size) ** 2
+        # Patch embedding: split image into patches and project
+        self.patch_embed = nn.Conv2d(in_channels, d_model,
+                                     kernel_size=patch_size, stride=patch_size)
+        # Learnable [CLS] token and position embeddings
+        self.cls_token = nn.Parameter(torch.zeros(1, 1, d_model))
+        self.pos_embed = nn.Parameter(torch.zeros(1, n_patches + 1, d_model))
+        # Transformer blocks
+        self.blocks = nn.ModuleList([
+            TransformerBlock(d_model, n_heads) for _ in range(n_layers)
+        ])
+        self.norm = nn.LayerNorm(d_model)
+        self.head = nn.Linear(d_model, n_classes)
+    def forward(self, x):
+        B = x.size(0)
+        # Patchify and flatten
+        x = self.patch_embed(x).flatten(2).transpose(1, 2)  # (B, n_patches, d_model)
+        # Prepend CLS token
+        cls = self.cls_token.expand(B, -1, -1)
+        x = torch.cat([cls, x], dim=1)
+        x = x + self.pos_embed
+        # Transformer blocks
+        for block in self.blocks:
+            x = block(x)
+        # Classification from CLS token
+        x = self.norm(x[:, 0])
+        return self.head(x)
+```
+## Efficient Transformer Variants
+| Method | Complexity | Key Idea | Reference |
+|--------|-----------|----------|-----------|
+| Standard attention | O(n^2) | Full pairwise attention | Vaswani et al., 2017 |
+| Linear attention | O(n) | Kernel approximation of softmax | Katharopoulos et al., 2020 |
+| Flash Attention | O(n^2) time, O(n) memory | IO-aware tiled computation | Dao et al., 2022 |
+| Sparse attention | O(n sqrt(n)) | Fixed or learned sparse patterns | Child et al., 2019 |
+| Sliding window | O(n * w) | Local attention window | Beltagy et al., 2020 (Longformer) |
+| Multi-query attention | O(n^2) but faster | Shared K/V across heads | Shazeer, 2019 |
+| Grouped-query attention | O(n^2) but faster | Groups of heads share K/V | Ainslie et al., 2023 |
+## Model Scaling Laws
+Kaplan et al. (2020) and Hoffmann et al. (2022, "Chinchilla") established scaling laws:
+```
+Performance (loss) scales as a power law with:
+- Model parameters (N): L ~ N^(-0.076)
+- Dataset size (D): L ~ D^(-0.095)
+- Compute budget (C): L ~ C^(-0.050)
+Chinchilla optimal scaling:
+- For compute budget C, allocate equally to model size and data
+- Optimal tokens ~ 20 * parameters
+- Example: 70B parameter model needs ~1.4T training tokens
+```
+## Research Resources
+| Resource | Description |
+|----------|-------------|
+| Hugging Face Transformers | Pre-trained models and fine-tuning framework |
+| Papers With Code | Benchmarks, SOTA tracking, and code links |
+| The Illustrated Transformer (Jay Alammar) | Visual explanations of attention |
+| Andrej Karpathy's nanoGPT | Minimal GPT implementation for education |
+| EleutherAI | Open-source LLM research community |
+| MLCommons | Standardized ML benchmarks (MLPerf) |

package/skills/domains/biomedical/clinical-research-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,232 @@
+---
+name: clinical-research-guide
+description: "Design clinical studies and report using CONSORT, STROBE guidelines"
+metadata:
+  openclaw:
+    emoji: "hospital"
+    category: "domains"
+    subcategory: "biomedical"
+    keywords: ["clinical research", "CONSORT", "STROBE", "clinical trial", "study design", "reporting guidelines"]
+    source: "wentor-research-plugins"
+---
+# Clinical Research Guide
+A skill for designing clinical studies and reporting results according to established guidelines. Covers randomized controlled trials (CONSORT), observational studies (STROBE), diagnostic studies (STARD), and systematic reviews (PRISMA).
+## Study Design Selection
+### Hierarchy of Evidence
+```
+Systematic Reviews / Meta-analyses
+       |
+Randomized Controlled Trials (RCTs)
+       |
+Cohort Studies (prospective)
+       |
+Case-Control Studies
+       |
+Cross-Sectional Studies
+       |
+Case Reports / Case Series
+       |
+Expert Opinion
+Choose the design that best answers your research question
+given ethical, practical, and resource constraints.
+```
+### Design Decision Framework
+```python
+def select_study_design(research_question: str,
+                        can_randomize: bool,
+                        outcome_prevalence: str,
+                        time_constraint: str) -> dict:
+    """
+    Guide selection of clinical study design.
+    Args:
+        research_question: The clinical question
+        can_randomize: Whether randomization is ethical and feasible
+        outcome_prevalence: 'common' or 'rare'
+        time_constraint: 'short', 'medium', or 'long'
+    """
+    if can_randomize:
+        design = {
+            "recommended": "Randomized Controlled Trial (RCT)",
+            "reporting": "CONSORT 2010",
+            "strengths": "Strongest causal inference",
+            "considerations": [
+                "Need equipoise (genuine uncertainty about which is better)",
+                "Blinding may or may not be feasible",
+                "Intent-to-treat analysis is the primary approach",
+                "Pre-register at ClinicalTrials.gov or ISRCTN"
+            ]
+        }
+    elif outcome_prevalence == "rare":
+        design = {
+            "recommended": "Case-Control Study",
+            "reporting": "STROBE",
+            "strengths": "Efficient for rare outcomes",
+            "considerations": [
+                "Select controls carefully (matching, population-based)",
+                "Recall bias is a major threat",
+                "Can only calculate odds ratios, not incidence"
+            ]
+        }
+    elif time_constraint == "short":
+        design = {
+            "recommended": "Cross-Sectional Study",
+            "reporting": "STROBE (cross-sectional extension)",
+            "strengths": "Quick, inexpensive, good for prevalence",
+            "considerations": [
+                "Cannot establish temporal sequence",
+                "Prevalence bias (overrepresents chronic conditions)",
+                "Useful for hypothesis generation"
+            ]
+        }
+    else:
+        design = {
+            "recommended": "Prospective Cohort Study",
+            "reporting": "STROBE",
+            "strengths": "Can establish temporal sequence, multiple outcomes",
+            "considerations": [
+                "Loss to follow-up is the main threat",
+                "Confounding must be addressed analytically",
+                "Expensive and time-consuming"
+            ]
+        }
+    return design
+```
+## CONSORT for Randomized Trials
+### Essential CONSORT Checklist Items
+```
+Title and Abstract:
+  - Identify as randomized trial in the title
+  - Structured abstract with trial design, methods, results, conclusions
+Methods:
+  - Trial design (parallel, crossover, factorial, etc.)
+  - Participants: Eligibility criteria, settings, locations
+  - Interventions: Precise details of interventions for each group
+  - Outcomes: Primary and secondary, how and when assessed
+  - Sample size: Calculation with assumptions stated
+  - Randomization: Sequence generation, allocation concealment
+  - Blinding: Who was blinded, how blinding was maintained
+Results:
+  - CONSORT flow diagram (enrollment, allocation, follow-up, analysis)
+  - Baseline demographic table (Table 1)
+  - Primary outcome with effect size and confidence interval
+  - Harms and adverse events
+Discussion:
+  - Limitations including sources of potential bias
+  - Generalizability
+  - Interpretation consistent with results
+```
+### CONSORT Flow Diagram
+```
+                    Assessed for eligibility (n=...)
+                              |
+                    Excluded (n=...)
+                    - Not meeting criteria (n=...)
+                    - Declined to participate (n=...)
+                    - Other reasons (n=...)
+                              |
+                    Randomized (n=...)
+                    /                    \
+            Allocated to               Allocated to
+            intervention (n=...)       control (n=...)
+                    |                        |
+            Lost to follow-up          Lost to follow-up
+            (n=..., reasons)           (n=..., reasons)
+                    |                        |
+            Analyzed (n=...)           Analyzed (n=...)
+            Excluded from analysis     Excluded from analysis
+            (n=..., reasons)           (n=..., reasons)
+```
+## STROBE for Observational Studies
+### Key STROBE Requirements
+```
+Study design specific items:
+Cohort:
+  - Report follow-up time (person-years, median)
+  - Report loss to follow-up with reasons
+  - Use hazard ratios or incidence rate ratios
+Case-Control:
+  - Describe case definition and case ascertainment
+  - Describe control selection (source, matching criteria)
+  - Report odds ratios with confidence intervals
+Cross-Sectional:
+  - Report response rate and non-response analysis
+  - Describe how the sample represents the target population
+  - Report prevalence with confidence intervals
+```
+## Sample Size and Power
+### Power Calculation
+```python
+def power_analysis_rct(effect_size: float, alpha: float = 0.05,
+                       power: float = 0.80, ratio: float = 1.0) -> dict:
+    """
+    Calculate required sample size for a two-arm RCT.
+    Args:
+        effect_size: Expected Cohen's d
+        alpha: Significance level (two-sided)
+        power: Desired statistical power
+        ratio: Allocation ratio (control:treatment)
+    """
+    from scipy import stats
+    import math
+    z_alpha = stats.norm.ppf(1 - alpha / 2)
+    z_beta = stats.norm.ppf(power)
+    n_per_arm = math.ceil(
+        ((z_alpha + z_beta) ** 2 * (1 + 1 / ratio)) / effect_size ** 2
+    )
+    return {
+        "n_per_arm": n_per_arm,
+        "total_n": n_per_arm + math.ceil(n_per_arm * ratio),
+        "parameters": {
+            "effect_size": effect_size,
+            "alpha": alpha,
+            "power": power,
+            "allocation_ratio": f"1:{ratio}"
+        },
+        "note": "Add 10-20% for anticipated dropout"
+    }
+```
+## Other Reporting Guidelines
+| Guideline | Study Type | Checklist Items |
+|-----------|-----------|-----------------|
+| CONSORT | Randomized trials | 25 items + flow diagram |
+| STROBE | Observational studies | 22 items |
+| STARD | Diagnostic accuracy studies | 30 items |
+| PRISMA | Systematic reviews | 27 items + flow diagram |
+| TRIPOD | Prediction models | 22 items |
+| SPIRIT | Trial protocols | 33 items |
+| CARE | Case reports | 13 items |
+All checklists are available at the EQUATOR Network (equator-network.org). Most journals require submission of the relevant checklist with your manuscript. Completing the checklist during manuscript writing, not after, ensures comprehensive reporting.

package/skills/domains/biomedical/clinicaltrials-api/SKILL.md ADDED Viewed

@@ -0,0 +1,177 @@
+---
+name: clinicaltrials-api
+description: "Clinical trial registry database search API"
+metadata:
+  openclaw:
+    emoji: "🔍"
+    category: "domains"
+    subcategory: "biomedical"
+    keywords: ["clinical medicine", "epidemiology", "public health", "evidence-based medicine"]
+    source: "https://clinicaltrials.gov/api/"
+---
+# ClinicalTrials API Guide
+## Overview
+ClinicalTrials.gov is a registry and results database of publicly and privately supported clinical studies conducted around the world, maintained by the U.S. National Library of Medicine (NLM) at the National Institutes of Health (NIH). It contains registration records for over 500,000 clinical trials from more than 220 countries, making it the largest and most comprehensive clinical trial registry in the world.
+The ClinicalTrials.gov API provides programmatic access to this vast repository of clinical trial data. Researchers can search for trials by condition, intervention, sponsor, location, phase, status, and many other criteria. The API returns detailed structured data including study design, eligibility criteria, outcome measures, enrollment information, study contacts, and results when available.
+Clinical researchers, pharmaceutical companies, systematic reviewers, epidemiologists, public health officials, patient advocacy groups, and health policy analysts use the ClinicalTrials.gov API to monitor the clinical trial landscape, identify recruiting studies, conduct meta-analyses, analyze research trends, and ensure comprehensive evidence coverage in systematic reviews. The database is a critical resource for evidence-based medicine and regulatory compliance.
+## Authentication
+No authentication required. The ClinicalTrials.gov API is freely accessible without any API key, token, or registration. All endpoints are publicly available. Users are expected to comply with the NCBI usage policies and make requests at a reasonable rate.
+## Core Endpoints
+### query/full_studies: Search Clinical Trials
+Search the ClinicalTrials.gov database and retrieve full study records with comprehensive metadata about trial design, eligibility, interventions, and outcomes.
+- **URL**: `GET https://clinicaltrials.gov/api/v2/studies`
+- **Parameters**:
+| Parameter      | Type   | Required | Description                                                |
+|----------------|--------|----------|------------------------------------------------------------|
+| query.term     | string | No       | Free-text search query across all fields                   |
+| query.cond     | string | No       | Condition or disease filter                                |
+| query.intr     | string | No       | Intervention or treatment filter                           |
+| query.spons    | string | No       | Sponsor or collaborator filter                             |
+| filter.overallStatus | string | No | Status filter: `RECRUITING`, `COMPLETED`, `ACTIVE_NOT_RECRUITING` |
+| filter.phase   | string | No       | Phase filter: `PHASE1`, `PHASE2`, `PHASE3`, `PHASE4`       |
+| filter.geo     | string | No       | Geographic filter (distance:lat,lng)                       |
+| sort            | string | No       | Sort field and direction                                   |
+| pageSize       | int    | No       | Results per page (default 10, max 1000)                    |
+| pageToken      | string | No       | Token for next page of results                             |
+| format         | string | No       | Response format: `json` (default) or `csv`                 |
+- **Example**:
+```bash
+# Search for recruiting cancer immunotherapy trials
+curl "https://clinicaltrials.gov/api/v2/studies?query.cond=cancer&query.intr=immunotherapy&filter.overallStatus=RECRUITING&pageSize=5"
+# Search by sponsor
+curl "https://clinicaltrials.gov/api/v2/studies?query.spons=NIH&filter.phase=PHASE3&pageSize=10"
+```
+- **Response**: Returns `totalCount`, `nextPageToken`, and `studies` array. Each study contains `protocolSection` with `identificationModule` (NCT ID, title, organization), `statusModule` (overall status, start/completion dates), `descriptionModule` (brief summary, detailed description), `conditionsModule` (conditions and keywords), `designModule` (study type, phases, allocation, intervention model), `armsInterventionsModule`, `eligibilityModule` (criteria, gender, age range), `contactsLocationsModule`, and `outcomesModule`.
+### Single Study Retrieval
+Retrieve a specific clinical trial by its NCT identifier.
+- **URL**: `GET https://clinicaltrials.gov/api/v2/studies/{nctId}`
+- **Parameters**:
+| Parameter | Type   | Required | Description                                |
+|-----------|--------|----------|--------------------------------------------|
+| nctId     | string | Yes      | NCT identifier (e.g., `NCT04280705`)      |
+- **Example**:
+```bash
+curl "https://clinicaltrials.gov/api/v2/studies/NCT04280705"
+```
+- **Response**: Returns the complete study record with all protocol sections, including detailed eligibility criteria, primary and secondary outcome measures, study arms and interventions, sponsor information, and references.
+## Rate Limits
+No formal rate limits are documented for the ClinicalTrials.gov API. However, the service follows NCBI usage guidelines which recommend no more than 3 requests per second without an API key, and up to 10 requests per second with an NCBI API key. For bulk data access, ClinicalTrials.gov provides downloadable data files at https://clinicaltrials.gov/AllPublicXML.zip and via the AACT (Aggregate Analysis of ClinicalTrials.gov) database at https://aact.ctti-clinicaltrials.org/.
+## Common Patterns
+### Monitor Recruiting Trials for a Condition
+Track actively recruiting trials for a specific disease or condition:
+```python
+import requests
+params = {
+    "query.cond": "Alzheimer's Disease",
+    "filter.overallStatus": "RECRUITING",
+    "filter.phase": "PHASE3",
+    "pageSize": 20
+}
+resp = requests.get("https://clinicaltrials.gov/api/v2/studies", params=params)
+data = resp.json()
+print(f"Found {data['totalCount']} recruiting Phase 3 Alzheimer's trials\n")
+for study in data["studies"]:
+    proto = study["protocolSection"]
+    ident = proto["identificationModule"]
+    status = proto["statusModule"]
+    print(f"{ident['nctId']}: {ident['briefTitle']}")
+    print(f"  Status: {status['overallStatus']}")
+    print(f"  Start: {status.get('startDateStruct', {}).get('date', 'N/A')}")
+    print()
+```
+### Systematic Review: Comprehensive Trial Search
+Perform a comprehensive search for systematic review inclusion screening:
+```python
+import requests
+all_studies = []
+page_token = None
+while True:
+    params = {
+        "query.cond": "type 2 diabetes",
+        "query.intr": "metformin",
+        "filter.overallStatus": "COMPLETED",
+        "pageSize": 100
+    }
+    if page_token:
+        params["pageToken"] = page_token
+    resp = requests.get("https://clinicaltrials.gov/api/v2/studies", params=params)
+    data = resp.json()
+    all_studies.extend(data["studies"])
+    page_token = data.get("nextPageToken")
+    if not page_token:
+        break
+print(f"Total completed metformin trials for T2D: {len(all_studies)}")
+```
+### Analyze Trial Characteristics
+Extract and analyze study design features for research landscape mapping:
+```python
+import requests
+from collections import Counter
+params = {
+    "query.cond": "COVID-19",
+    "filter.phase": "PHASE3",
+    "pageSize": 100
+}
+resp = requests.get("https://clinicaltrials.gov/api/v2/studies", params=params)
+data = resp.json()
+sponsors = Counter()
+for study in data["studies"]:
+    org = study["protocolSection"]["identificationModule"].get("organization", {})
+    sponsors[org.get("fullName", "Unknown")] += 1
+print("Top sponsors of Phase 3 COVID-19 trials:")
+for sponsor, count in sponsors.most_common(10):
+    print(f"  {sponsor}: {count} trials")
+```
+## References
+- Official API documentation: https://clinicaltrials.gov/data-api/api
+- ClinicalTrials.gov homepage: https://clinicaltrials.gov/
+- AACT database for bulk analysis: https://aact.ctti-clinicaltrials.org/
+- WHO ICTRP (international registry portal): https://trialsearch.who.int/
+- NCBI usage policies: https://www.ncbi.nlm.nih.gov/home/about/policies/