npm - @wentorai/research-plugins - Versions diffs - 1.0.0 - Mend

@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (252) hide show

package/skills/domains/biomedical/epidemiology-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,200 @@
+---
+name: epidemiology-guide
+description: "Epidemiological study designs, measures of association, and public health ana..."
+metadata:
+  openclaw:
+    emoji: "microscope"
+    category: "domains"
+    subcategory: "biomedical"
+    keywords: ["epidemiology", "public health", "evidence-based medicine", "clinical medicine", "disease surveillance"]
+    source: "wentor"
+---
+# Epidemiology Guide
+A skill for designing and analyzing epidemiological studies. Covers study design selection, measures of disease frequency and association, bias assessment, and public health data analysis methods.
+## Study Design Selection
+### Design Hierarchy
+```
+                    Evidence Strength
+                         |
+    Systematic Review / Meta-Analysis   (Highest)
+                         |
+         Randomized Controlled Trial
+                         |
+              Cohort Study (Prospective)
+                         |
+            Case-Control Study
+                         |
+         Cross-Sectional Study
+                         |
+         Case Report / Case Series       (Lowest)
+```
+### When to Use Each Design
+| Design | Research Question | Time | Cost | Bias Risk |
+|--------|------------------|------|------|-----------|
+| RCT | Does intervention X prevent outcome Y? | Years | Very high | Lowest |
+| Prospective Cohort | Does exposure X increase risk of Y? | Years | High | Moderate |
+| Retrospective Cohort | Historical exposure-outcome relationship? | Months | Moderate | Moderate-High |
+| Case-Control | What exposures are associated with rare disease? | Months | Low | High |
+| Cross-Sectional | What is the prevalence of X? | Weeks | Low | High |
+| Ecological | Do population-level factors correlate with disease? | Weeks | Very low | Very high |
+## Measures of Disease Frequency
+```python
+import numpy as np
+def compute_measures(cases: int, population: int,
+                      person_time: float = None,
+                      period_years: float = 1.0) -> dict:
+    """
+    Compute basic epidemiological measures.
+    Args:
+        cases: Number of new cases (for incidence) or existing cases (for prevalence)
+        population: Population at risk
+        person_time: Person-years of follow-up (for incidence rate)
+        period_years: Time period in years (for cumulative incidence)
+    """
+    measures = {}
+    # Point prevalence
+    measures['prevalence'] = {
+        'value': cases / population,
+        'per_1000': (cases / population) * 1000,
+        'formula': 'cases / population at a point in time'
+    }
+    # Cumulative incidence (risk)
+    measures['cumulative_incidence'] = {
+        'value': cases / population,
+        'per_1000': (cases / population) * 1000,
+        'period_years': period_years,
+        'formula': 'new cases / population at risk during time period'
+    }
+    # Incidence rate (if person-time available)
+    if person_time:
+        measures['incidence_rate'] = {
+            'value': cases / person_time,
+            'per_1000_py': (cases / person_time) * 1000,
+            'formula': 'new cases / person-time at risk'
+        }
+    return measures
+```
+## Measures of Association
+### Risk Ratio, Odds Ratio, and Attributable Risk
+```python
+def measures_of_association(a: int, b: int, c: int, d: int) -> dict:
+    """
+    Compute epidemiological measures of association from a 2x2 table.
+                    Disease+    Disease-
+    Exposed+          a           b        a+b
+    Exposed-          c           d        c+d
+                     a+c         b+d        N
+    Args:
+        a: Exposed with disease
+        b: Exposed without disease
+        c: Unexposed with disease
+        d: Unexposed without disease
+    """
+    # Risk in exposed and unexposed
+    risk_exposed = a / (a + b)
+    risk_unexposed = c / (c + d)
+    # Risk Ratio (Relative Risk)
+    rr = risk_exposed / risk_unexposed
+    ln_rr = np.log(rr)
+    se_ln_rr = np.sqrt(1/a - 1/(a+b) + 1/c - 1/(c+d))
+    rr_ci = (np.exp(ln_rr - 1.96*se_ln_rr), np.exp(ln_rr + 1.96*se_ln_rr))
+    # Odds Ratio
+    or_val = (a * d) / (b * c)
+    ln_or = np.log(or_val)
+    se_ln_or = np.sqrt(1/a + 1/b + 1/c + 1/d)
+    or_ci = (np.exp(ln_or - 1.96*se_ln_or), np.exp(ln_or + 1.96*se_ln_or))
+    # Attributable Risk (Risk Difference)
+    ar = risk_exposed - risk_unexposed
+    se_ar = np.sqrt(risk_exposed*(1-risk_exposed)/(a+b) +
+                     risk_unexposed*(1-risk_unexposed)/(c+d))
+    ar_ci = (ar - 1.96*se_ar, ar + 1.96*se_ar)
+    # Attributable Fraction in Exposed
+    af_exposed = (rr - 1) / rr
+    # Population Attributable Fraction
+    prevalence_exposure = (a + b) / (a + b + c + d)
+    paf = prevalence_exposure * (rr - 1) / (prevalence_exposure * (rr - 1) + 1)
+    return {
+        'risk_ratio': {'value': round(rr, 3), 'ci_95': tuple(round(x, 3) for x in rr_ci)},
+        'odds_ratio': {'value': round(or_val, 3), 'ci_95': tuple(round(x, 3) for x in or_ci)},
+        'risk_difference': {'value': round(ar, 4), 'ci_95': tuple(round(x, 4) for x in ar_ci)},
+        'attributable_fraction_exposed': round(af_exposed, 3),
+        'population_attributable_fraction': round(paf, 3),
+        'number_needed_to_harm': round(1/ar, 1) if ar > 0 else None
+    }
+# Example: smoking and lung cancer
+result = measures_of_association(a=80, b=920, c=10, d=990)
+print(f"RR: {result['risk_ratio']['value']} ({result['risk_ratio']['ci_95']})")
+print(f"OR: {result['odds_ratio']['value']} ({result['odds_ratio']['ci_95']})")
+print(f"PAF: {result['population_attributable_fraction']}")
+```
+## Bias Assessment
+### Types of Bias and Mitigation
+| Bias Type | Description | Mitigation Strategy |
+|-----------|------------|-------------------|
+| Selection bias | Non-random sample selection | Random sampling, matching |
+| Information bias | Measurement error in exposure/outcome | Validated instruments, blinding |
+| Recall bias | Differential recall by disease status | Use records, not self-report |
+| Confounding | Third variable affects both exposure and outcome | Stratification, regression, matching |
+| Lead-time bias | Earlier detection misinterpreted as longer survival | Use mortality, not survival |
+| Healthy worker effect | Workers are healthier than general population | Use employed comparison group |
+### Confounding Assessment
+```python
+def assess_confounding(crude_rr: float, adjusted_rr: float,
+                        threshold: float = 0.10) -> dict:
+    """
+    Assess whether a variable is a confounder.
+    """
+    pct_change = abs(crude_rr - adjusted_rr) / crude_rr * 100
+    return {
+        'crude_RR': crude_rr,
+        'adjusted_RR': adjusted_rr,
+        'percent_change': round(pct_change, 1),
+        'is_confounder': pct_change > threshold * 100,
+        'interpretation': (
+            f"{'Confounder detected' if pct_change > threshold * 100 else 'Not a confounder'}: "
+            f"adjusting changed the RR by {pct_change:.1f}% "
+            f"(threshold: {threshold*100:.0f}%)"
+        )
+    }
+```
+## Survival Analysis
+For time-to-event data, use Kaplan-Meier estimators for descriptive analysis, log-rank tests for group comparisons, and Cox proportional hazards regression for multivariable analysis. Always check the proportional hazards assumption using Schoenfeld residuals and report median survival times with 95% confidence intervals.
+## Reporting Standards
+Follow STROBE (observational studies), CONSORT (trials), or RECORD (routinely collected data) reporting guidelines. Report all measures with 95% confidence intervals. Present both crude and adjusted estimates to show the impact of confounding adjustment.

package/skills/domains/biomedical/genomics-analysis-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,270 @@
+---
+name: genomics-analysis-guide
+description: "Workflows for RNA-seq, GWAS, and variant calling in genomic research"
+metadata:
+  openclaw:
+    emoji: "microscope"
+    category: "domains"
+    subcategory: "biomedical"
+    keywords: ["genomics", "RNA-seq", "GWAS", "molecular biology", "genetics", "bioinformatics"]
+    source: "wentor-research-plugins"
+---
+# Genomics Analysis Guide
+## Overview
+Genomic data analysis is the computational backbone of modern molecular biology. From identifying disease-associated variants through Genome-Wide Association Studies (GWAS) to quantifying gene expression with RNA-seq, these workflows transform raw sequencing data into biological insights that drive discoveries in medicine, agriculture, and evolutionary biology.
+This guide covers the three most common genomic analysis workflows: RNA-seq differential expression analysis, GWAS for variant-trait associations, and variant calling from whole-genome sequencing (WGS) data. Each workflow is described with tool recommendations, command-line examples, and downstream analysis steps in R and Python.
+The emphasis is on reproducibility and best practices. Genomic analyses involve many sequential steps, and errors in early stages propagate through the entire pipeline. Following standardized workflows -- like those from the Broad Institute, ENCODE, and Bioconductor -- reduces the risk of methodological errors.
+## RNA-seq Analysis Pipeline
+### Workflow Overview
+```
+Raw FASTQ files
+    |
+    v
+[Quality Control] --> FastQC, MultiQC
+    |
+    v
+[Trimming] --> Trimmomatic, fastp
+    |
+    v
+[Alignment] --> STAR, HISAT2
+    |
+    v
+[Quantification] --> featureCounts, Salmon
+    |
+    v
+[Differential Expression] --> DESeq2, edgeR
+    |
+    v
+[Pathway Analysis] --> clusterProfiler, GSEA
+```
+### Step 1: Quality Control
+```bash
+# Run FastQC on all FASTQ files
+fastqc -t 8 -o qc_results/ raw_data/*.fastq.gz
+# Aggregate QC reports
+multiqc qc_results/ -o multiqc_report/
+```
+### Step 2: Read Trimming
+```bash
+# fastp for quality trimming and adapter removal
+fastp \
+  --in1 sample_R1.fastq.gz \
+  --in2 sample_R2.fastq.gz \
+  --out1 trimmed_R1.fastq.gz \
+  --out2 trimmed_R2.fastq.gz \
+  --detect_adapter_for_pe \
+  --thread 8 \
+  --html fastp_report.html
+```
+### Step 3: Alignment with STAR
+```bash
+# Build genome index (one time)
+STAR --runMode genomeGenerate \
+  --genomeDir star_index/ \
+  --genomeFastaFiles genome.fa \
+  --sjdbGTFfile annotations.gtf \
+  --runThreadN 16
+# Align reads
+STAR --runMode alignReads \
+  --genomeDir star_index/ \
+  --readFilesIn trimmed_R1.fastq.gz trimmed_R2.fastq.gz \
+  --readFilesCommand zcat \
+  --outSAMtype BAM SortedByCoordinate \
+  --quantMode GeneCounts \
+  --outFileNamePrefix sample_ \
+  --runThreadN 16
+```
+### Step 4: Differential Expression with DESeq2
+```r
+library(DESeq2)
+# Load count matrix and sample info
+counts <- read.csv("gene_counts.csv", row.names = 1)
+coldata <- read.csv("sample_info.csv", row.names = 1)
+# Create DESeq2 object
+dds <- DESeqDataSetFromMatrix(
+  countData = counts,
+  colData = coldata,
+  design = ~ condition
+)
+# Filter low-count genes
+keep <- rowSums(counts(dds) >= 10) >= 3
+dds <- dds[keep, ]
+# Run differential expression
+dds <- DESeq(dds)
+res <- results(dds, contrast = c("condition", "treated", "control"),
+               alpha = 0.05)
+# Summary
+summary(res)
+# Export significant genes
+sig_genes <- subset(as.data.frame(res), padj < 0.05 & abs(log2FoldChange) > 1)
+write.csv(sig_genes, "significant_genes.csv")
+```
+## GWAS Pipeline
+### Workflow Overview
+```
+Genotype Data (VCF/PLINK)
+    |
+    v
+[Quality Control] --> Sample/variant filtering
+    |
+    v
+[Population Stratification] --> PCA
+    |
+    v
+[Association Testing] --> PLINK2, REGENIE
+    |
+    v
+[Multiple Testing Correction] --> Bonferroni, FDR
+    |
+    v
+[Visualization] --> Manhattan plot, QQ plot
+```
+### QC with PLINK2
+```bash
+# Sample QC
+plink2 \
+  --bfile dataset \
+  --mind 0.05 \          # Remove samples with >5% missing
+  --geno 0.02 \          # Remove variants with >2% missing
+  --maf 0.01 \           # Remove rare variants (MAF < 1%)
+  --hwe 1e-6 \           # HWE filter
+  --make-bed \
+  --out dataset_qc
+# LD pruning for PCA
+plink2 \
+  --bfile dataset_qc \
+  --indep-pairwise 50 5 0.2 \
+  --out pruned
+# PCA for population stratification
+plink2 \
+  --bfile dataset_qc \
+  --extract pruned.prune.in \
+  --pca 10 \
+  --out pca_results
+```
+### Association Testing
+```bash
+# Linear/logistic regression with covariates
+plink2 \
+  --bfile dataset_qc \
+  --glm \
+  --pheno phenotypes.txt \
+  --covar pca_results.eigenvec \
+  --covar-col-nums 3-12 \
+  --out gwas_results
+```
+### Manhattan Plot in Python
+```python
+import pandas as pd
+import matplotlib.pyplot as plt
+import numpy as np
+def manhattan_plot(gwas_file, output='manhattan.pdf'):
+    df = pd.read_csv(gwas_file, sep='\t')
+    df['-log10p'] = -np.log10(df['P'])
+    # Assign cumulative positions
+    df = df.sort_values(['CHR', 'BP'])
+    df['pos_cum'] = 0
+    offset = 0
+    for chrom in df['CHR'].unique():
+        mask = df['CHR'] == chrom
+        df.loc[mask, 'pos_cum'] = df.loc[mask, 'BP'] + offset
+        offset = df.loc[mask, 'pos_cum'].max()
+    fig, ax = plt.subplots(figsize=(16, 5))
+    colors = ['#3B82F6', '#94A3B8']
+    for i, chrom in enumerate(df['CHR'].unique()):
+        subset = df[df['CHR'] == chrom]
+        ax.scatter(subset['pos_cum'], subset['-log10p'],
+                   s=2, color=colors[i % 2], alpha=0.7)
+    ax.axhline(-np.log10(5e-8), color='red', linestyle='--', linewidth=0.8)
+    ax.set_xlabel('Chromosome')
+    ax.set_ylabel('-log10(p-value)')
+    fig.savefig(output, dpi=300, bbox_inches='tight')
+```
+## Variant Calling Pipeline
+### GATK Best Practices
+```bash
+# Mark duplicates
+gatk MarkDuplicates \
+  -I aligned.bam \
+  -O dedup.bam \
+  -M metrics.txt
+# Base quality score recalibration
+gatk BaseRecalibrator \
+  -I dedup.bam \
+  -R reference.fa \
+  --known-sites dbsnp.vcf \
+  -O recal_table.txt
+gatk ApplyBQSR \
+  -I dedup.bam \
+  -R reference.fa \
+  --bqsr-recal-file recal_table.txt \
+  -O recal.bam
+# Call variants
+gatk HaplotypeCaller \
+  -I recal.bam \
+  -R reference.fa \
+  -O variants.g.vcf \
+  -ERC GVCF
+```
+## Best Practices
+- **Use containerized workflows.** Nextflow + Docker/Singularity ensures reproducibility across environments.
+- **Document every parameter.** Small changes in alignment settings can significantly affect downstream results.
+- **Apply appropriate multiple testing corrections.** Genome-wide significance is p < 5e-8 for GWAS.
+- **Validate findings in independent cohorts.** Replication is essential before biological interpretation.
+- **Archive raw data and analysis scripts.** Deposit in GEO (expression) or dbGaP (genotypes) for reproducibility.
+- **Use established pipelines (nf-core).** Community-maintained Nextflow pipelines encode best practices.
+## References
+- [DESeq2 Vignette](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) -- Differential expression analysis
+- [GATK Best Practices](https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows) -- Variant calling
+- [PLINK2 Documentation](https://www.cog-genomics.org/plink/2.0/) -- Genetic association analysis
+- [nf-core Pipelines](https://nf-co.re/) -- Community Nextflow workflows
+- [RNA-seq Analysis Tutorial](https://rnabio.org/) -- Griffith Lab comprehensive tutorial

package/skills/domains/business/market-analysis-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,112 @@
+---
+name: market-analysis-guide
+description: "Structured frameworks for market sizing, competitive analysis, and strategic ..."
+metadata:
+  openclaw:
+    emoji: "chart_with_upwards_trend"
+    category: "domains"
+    subcategory: "business"
+    keywords: ["market analysis", "strategic management", "operations management", "competitive analysis", "market sizing"]
+    source: "wentor"
+---
+# Market Analysis Guide
+A comprehensive skill for conducting rigorous market analysis in academic and applied research contexts. This guide covers quantitative market sizing, competitive landscape mapping, and strategic positioning frameworks grounded in peer-reviewed methodologies.
+## Market Sizing Methodologies
+Market sizing is the foundation of any credible market analysis. There are two primary approaches, and robust research typically employs both for triangulation.
+**Top-Down Approach (TAM/SAM/SOM)**
+Start with the total addressable market and narrow systematically:
+```
+TAM (Total Addressable Market)
+  -> SAM (Serviceable Available Market)
+    -> SOM (Serviceable Obtainable Market)
+Example calculation:
+  TAM = Global higher-education EdTech spend = $340B (2025, HolonIQ)
+  SAM = AI-powered research tools segment  = $12B
+  SOM = Realistic capture in Year 3        = $120M (1% of SAM)
+```
+**Bottom-Up Approach**
+Build estimates from unit economics:
+```python
+# Bottom-up market sizing
+users_in_target_segment = 8_000_000  # global PhD + postdoc researchers
+adoption_rate = 0.05                  # 5% in first 3 years
+avg_revenue_per_user = 180            # USD/year
+bottom_up_estimate = users_in_target_segment * adoption_rate * avg_revenue_per_user
+# Result: $72,000,000
+```
+Always cite the data sources for each assumption. Use government statistics (e.g., NSF, Eurostat), industry reports (Gartner, McKinsey), and published academic datasets.
+## Competitive Analysis Frameworks
+### Porter's Five Forces
+Apply Porter's framework systematically to map industry structure:
+| Force | Key Questions | Data Sources |
+|-------|--------------|--------------|
+| Rivalry | How many direct competitors? Market concentration (HHI)? | Crunchbase, SEC filings |
+| New Entrants | Capital requirements? Regulatory barriers? | Patent databases, regulatory filings |
+| Substitutes | What alternatives exist? Switching costs? | User surveys, app store data |
+| Buyer Power | Customer concentration? Price sensitivity? | Industry reports, interviews |
+| Supplier Power | Input scarcity? Vendor lock-in? | Supply chain databases |
+### SWOT and TOWS Matrix
+Go beyond basic SWOT by constructing a TOWS matrix that generates actionable strategies:
+```
+              Strengths (S)           Weaknesses (W)
+Opportunities  SO strategies           WO strategies
+  (O)          (use S to exploit O)    (overcome W via O)
+Threats        ST strategies           WT strategies
+  (T)          (use S to counter T)    (minimize W, avoid T)
+```
+## Data Collection and Validation
+Primary data collection methods for market analysis research:
+1. **Structured interviews** with industry experts (N >= 12 for saturation)
+2. **Survey instruments** validated with Cronbach's alpha >= 0.70
+3. **Conjoint analysis** for preference and willingness-to-pay estimation
+4. **Web scraping** of pricing pages, job postings, and product changelogs
+Secondary data sources to cross-validate:
+- Statista, IBISWorld, Grand View Research for market reports
+- USPTO/EPO patent filings for technology trajectory analysis
+- PitchBook/Crunchbase for funding and M&A activity
+## Reporting and Visualization
+Present findings using clear, reproducible visualizations:
+```python
+import matplotlib.pyplot as plt
+import numpy as np
+segments = ['Segment A', 'Segment B', 'Segment C', 'Segment D']
+sizes = [45, 28, 18, 9]
+colors = ['#3B82F6', '#EF4444', '#10B981', '#F59E0B']
+fig, ax = plt.subplots(figsize=(8, 6))
+ax.barh(segments, sizes, color=colors)
+ax.set_xlabel('Market Share (%)')
+ax.set_title('Competitive Landscape by Segment')
+plt.tight_layout()
+plt.savefig('market_share.png', dpi=300)
+```
+Always include confidence intervals or sensitivity ranges for quantitative estimates. A well-structured market analysis report should contain an executive summary, methodology section, findings with visualizations, and a limitations discussion.