npm - @wentorai/research-plugins - Versions diffs - 1.0.0 - Mend

@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (252) hide show

package/skills/analysis/statistics/hypothesis-testing-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,210 @@
+---
+name: hypothesis-testing-guide
+description: "Statistical hypothesis testing, power analysis, and significance reporting"
+metadata:
+  openclaw:
+    emoji: "📈"
+    category: "analysis"
+    subcategory: "statistics"
+    keywords: ["hypothesis testing", "significance testing", "t-test", "ANOVA", "sample size calculation", "power analysis"]
+    source: "N/A"
+---
+# Hypothesis Testing Guide
+## Overview
+Hypothesis testing is the backbone of empirical research. It provides a principled framework for deciding whether observed differences in data reflect genuine effects or merely random variation. Misuse of hypothesis tests -- p-hacking, ignoring assumptions, confusing statistical and practical significance -- is a leading cause of irreproducible findings.
+This guide covers the core hypothesis testing framework, the most commonly used tests across disciplines, assumption checking, effect size reporting, power analysis for sample size planning, and multiple comparison corrections. Each test is accompanied by Python code using scipy, statsmodels, and pingouin, ready to integrate into research workflows.
+The goal is not just to help you run tests, but to help you run the right test correctly and report results following modern standards (APA 7th edition, journal best practices).
+## The Hypothesis Testing Framework
+### Step-by-Step Procedure
+1. **State hypotheses.** Define H0 (null: no effect) and H1 (alternative: effect exists).
+2. **Choose significance level.** Typically alpha = 0.05, but justify your choice.
+3. **Select the appropriate test.** Based on data type, distribution, and design.
+4. **Check assumptions.** Normality, homogeneity of variance, independence.
+5. **Compute test statistic and p-value.**
+6. **Report effect size and confidence interval.** p-values alone are insufficient.
+7. **Make a decision.** Reject or fail to reject H0, with practical interpretation.
+### Common Errors
+| Error Type | Definition | Probability |
+|-----------|-----------|-------------|
+| Type I (False Positive) | Reject H0 when it is true | alpha (usually 0.05) |
+| Type II (False Negative) | Fail to reject H0 when it is false | beta (usually 0.20) |
+| Power | Probability of correctly detecting an effect | 1 - beta (target: 0.80) |
+## Test Selection Guide
+| Research Question | Data Type | Groups | Test |
+|-------------------|-----------|--------|------|
+| Two group means differ? | Continuous, normal | 2 independent | Independent t-test |
+| Before/after difference? | Continuous, normal | 2 paired | Paired t-test |
+| Multiple group means differ? | Continuous, normal | 3+ independent | One-way ANOVA |
+| Two group medians differ? | Ordinal / non-normal | 2 independent | Mann-Whitney U |
+| Before/after (non-normal)? | Ordinal / non-normal | 2 paired | Wilcoxon signed-rank |
+| Multiple groups (non-normal)? | Ordinal / non-normal | 3+ independent | Kruskal-Wallis |
+| Association between categories? | Categorical | 2 variables | Chi-square test |
+| Correlation? | Continuous | 2 variables | Pearson or Spearman |
+## Running Tests in Python
+### Independent Samples t-Test
+```python
+from scipy import stats
+import numpy as np
+import pingouin as pg
+# Generate example data
+control = np.random.normal(50, 10, n=30)
+treatment = np.random.normal(55, 10, n=30)
+# Check normality assumption
+stat_c, p_c = stats.shapiro(control)
+stat_t, p_t = stats.shapiro(treatment)
+print(f"Normality p-values: control={p_c:.3f}, treatment={p_t:.3f}")
+# Check homogeneity of variance
+stat_l, p_l = stats.levene(control, treatment)
+print(f"Levene's test p={p_l:.3f}")
+# Run t-test
+t_stat, p_val = stats.ttest_ind(control, treatment, equal_var=(p_l > 0.05))
+# Effect size (Cohen's d)
+cohens_d = (treatment.mean() - control.mean()) / np.sqrt(
+    ((len(control)-1)*control.var() + (len(treatment)-1)*treatment.var())
+    / (len(control) + len(treatment) - 2)
+)
+print(f"t={t_stat:.3f}, p={p_val:.4f}, Cohen's d={cohens_d:.3f}")
+```
+### One-Way ANOVA with Post-Hoc Tests
+```python
+import pandas as pd
+df = pd.DataFrame({
+    'score': np.concatenate([
+        np.random.normal(50, 10, 30),
+        np.random.normal(55, 10, 30),
+        np.random.normal(60, 10, 30)
+    ]),
+    'group': np.repeat(['A', 'B', 'C'], 30)
+})
+# ANOVA
+aov = pg.anova(data=df, dv='score', between='group', detailed=True)
+print(aov)
+# Post-hoc pairwise comparisons (Tukey HSD)
+posthoc = pg.pairwise_tukey(data=df, dv='score', between='group')
+print(posthoc[['A', 'B', 'diff', 'p-tukey', 'hedges']])
+```
+### Chi-Square Test of Independence
+```python
+# Contingency table
+observed = pd.DataFrame(
+    [[45, 30], [25, 50]],
+    index=['Method A', 'Method B'],
+    columns=['Success', 'Failure']
+)
+chi2, p, dof, expected = stats.chi2_contingency(observed)
+cramers_v = np.sqrt(chi2 / (observed.values.sum() * (min(observed.shape) - 1)))
+print(f"chi2={chi2:.3f}, p={p:.4f}, Cramer's V={cramers_v:.3f}")
+```
+## Power Analysis and Sample Size
+Power analysis answers: "How many participants do I need?"
+```python
+from statsmodels.stats.power import TTestIndPower, FTestAnovaPower
+# For a two-sample t-test
+analysis = TTestIndPower()
+# Calculate required sample size
+n = analysis.solve_power(
+    effect_size=0.5,   # Cohen's d (medium effect)
+    alpha=0.05,
+    power=0.80,
+    ratio=1.0,         # Equal group sizes
+    alternative='two-sided'
+)
+print(f"Required n per group: {int(np.ceil(n))}")
+# Power curve
+import matplotlib.pyplot as plt
+sample_sizes = np.arange(10, 200, 5)
+powers = [analysis.power(effect_size=0.5, nobs1=n, ratio=1.0, alpha=0.05)
+          for n in sample_sizes]
+fig, ax = plt.subplots()
+ax.plot(sample_sizes, powers)
+ax.axhline(0.8, color='red', linestyle='--', label='Power = 0.80')
+ax.set_xlabel('Sample Size per Group')
+ax.set_ylabel('Statistical Power')
+ax.legend()
+fig.savefig('power_curve.pdf')
+```
+### Effect Size Reference Table
+| Effect Size | Small | Medium | Large |
+|-------------|-------|--------|-------|
+| Cohen's d (t-test) | 0.2 | 0.5 | 0.8 |
+| eta-squared (ANOVA) | 0.01 | 0.06 | 0.14 |
+| Cramer's V (chi-square) | 0.1 | 0.3 | 0.5 |
+| Pearson r (correlation) | 0.1 | 0.3 | 0.5 |
+## Multiple Comparison Corrections
+When running multiple tests, the family-wise error rate inflates. Use corrections:
+```python
+from statsmodels.stats.multitest import multipletests
+p_values = [0.01, 0.04, 0.03, 0.08, 0.002]
+# Bonferroni (conservative)
+reject_bonf, pvals_bonf, _, _ = multipletests(p_values, method='bonferroni')
+# Benjamini-Hochberg FDR (less conservative)
+reject_bh, pvals_bh, _, _ = multipletests(p_values, method='fdr_bh')
+for i, p in enumerate(p_values):
+    print(f"p={p:.3f} | Bonferroni: {pvals_bonf[i]:.3f} ({reject_bonf[i]}) "
+          f"| BH-FDR: {pvals_bh[i]:.3f} ({reject_bh[i]})")
+```
+## Best Practices
+- **Always report effect sizes alongside p-values.** A significant p-value with a tiny effect size is rarely meaningful.
+- **Pre-register your analysis plan.** This prevents p-hacking and HARKing (Hypothesizing After Results are Known).
+- **Check assumptions before running parametric tests.** Use non-parametric alternatives when assumptions are violated.
+- **Use confidence intervals.** They convey both effect magnitude and precision.
+- **Report exact p-values (p = 0.032), not thresholds (p < 0.05).** Except when p < 0.001.
+- **Consider Bayesian alternatives.** Bayes factors provide evidence for H0, not just against it.
+- **Plan sample sizes a priori.** Power analysis should be done before data collection, not after.
+## References
+- [scipy.stats Documentation](https://docs.scipy.org/doc/scipy/reference/stats.html)
+- [pingouin Documentation](https://pingouin-stats.org/) -- Friendly statistics in Python
+- [statsmodels Documentation](https://www.statsmodels.org/)
+- [APA 7th Edition Statistics Reporting Guide](https://apastyle.apa.org/instructional-aids/numbers-statistics-guide.pdf)
+- [Statistical Tests Cheat Sheet](https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/)

package/skills/analysis/statistics/meta-analysis-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,206 @@
+---
+name: meta-analysis-guide
+description: "Conduct systematic meta-analyses with effect size pooling and heterogeneity a..."
+metadata:
+  openclaw:
+    emoji: "balance_scale"
+    category: "analysis"
+    subcategory: "statistics"
+    keywords: ["meta-analysis", "statistical synthesis", "heterogeneity", "effect size", "forest plot"]
+    source: "wentor"
+---
+# Meta-Analysis Guide
+A skill for conducting rigorous meta-analyses: computing and pooling effect sizes, assessing heterogeneity, evaluating publication bias, and generating forest plots. Follows Cochrane Handbook and PRISMA guidelines.
+## Effect Size Computation
+### Common Effect Size Measures
+| Measure | Use Case | Formula | Interpretation |
+|---------|----------|---------|----------------|
+| Cohen's d | Mean difference (2 groups) | (M1 - M2) / S_pooled | 0.2 small, 0.5 medium, 0.8 large |
+| Hedges' g | d with small-sample correction | d * J(df) | Preferred over d for small N |
+| Pearson r | Correlation | r | 0.1 small, 0.3 medium, 0.5 large |
+| Odds Ratio | Binary outcomes | (a*d)/(b*c) | 1 = no effect |
+| Risk Ratio | Binary outcomes | (a/(a+b))/(c/(c+d)) | 1 = no effect |
+| SMD | Standardized mean difference | Same as Hedges' g | When scales differ |
+### Computing Effect Sizes in Python
+```python
+import numpy as np
+from dataclasses import dataclass
+@dataclass
+class EffectSize:
+    estimate: float
+    variance: float
+    se: float
+    ci_lower: float
+    ci_upper: float
+    measure: str
+def cohens_d(m1: float, m2: float, sd1: float, sd2: float,
+              n1: int, n2: int) -> EffectSize:
+    """
+    Compute Hedges' g (bias-corrected Cohen's d).
+    """
+    # Pooled standard deviation
+    sd_pooled = np.sqrt(((n1-1)*sd1**2 + (n2-1)*sd2**2) / (n1+n2-2))
+    # Cohen's d
+    d = (m1 - m2) / sd_pooled
+    # Small-sample correction (Hedges' g)
+    df = n1 + n2 - 2
+    j = 1 - (3 / (4*df - 1))
+    g = d * j
+    # Variance of g
+    var_g = (n1+n2)/(n1*n2) + g**2 / (2*(n1+n2))
+    se_g = np.sqrt(var_g)
+    return EffectSize(
+        estimate=g,
+        variance=var_g,
+        se=se_g,
+        ci_lower=g - 1.96*se_g,
+        ci_upper=g + 1.96*se_g,
+        measure='Hedges_g'
+    )
+def odds_ratio(a: int, b: int, c: int, d: int) -> EffectSize:
+    """
+    Compute log odds ratio from a 2x2 table.
+    a=treatment success, b=treatment failure, c=control success, d=control failure
+    """
+    # Add 0.5 continuity correction if any cell is 0
+    if any(x == 0 for x in [a, b, c, d]):
+        a, b, c, d = a+0.5, b+0.5, c+0.5, d+0.5
+    log_or = np.log((a*d) / (b*c))
+    var = 1/a + 1/b + 1/c + 1/d
+    se = np.sqrt(var)
+    return EffectSize(
+        estimate=log_or,
+        variance=var,
+        se=se,
+        ci_lower=log_or - 1.96*se,
+        ci_upper=log_or + 1.96*se,
+        measure='log_OR'
+    )
+```
+## Fixed-Effect and Random-Effects Models
+### Inverse-Variance Pooling
+```python
+def random_effects_meta(effects: list[EffectSize]) -> dict:
+    """
+    Random-effects meta-analysis using DerSimonian-Laird estimator.
+    """
+    yi = np.array([e.estimate for e in effects])
+    vi = np.array([e.variance for e in effects])
+    wi = 1 / vi
+    k = len(effects)
+    # Fixed-effect estimate
+    fe_estimate = np.sum(wi * yi) / np.sum(wi)
+    # Q statistic for heterogeneity
+    Q = np.sum(wi * (yi - fe_estimate)**2)
+    df = k - 1
+    # DerSimonian-Laird tau-squared
+    C = np.sum(wi) - np.sum(wi**2) / np.sum(wi)
+    tau2 = max(0, (Q - df) / C)
+    # Random-effects weights
+    wi_re = 1 / (vi + tau2)
+    re_estimate = np.sum(wi_re * yi) / np.sum(wi_re)
+    re_se = np.sqrt(1 / np.sum(wi_re))
+    re_ci = (re_estimate - 1.96*re_se, re_estimate + 1.96*re_se)
+    # Heterogeneity statistics
+    I2 = max(0, (Q - df) / Q * 100) if Q > 0 else 0
+    H2 = Q / df if df > 0 else 1
+    return {
+        'pooled_effect': re_estimate,
+        'se': re_se,
+        'ci_95': re_ci,
+        'tau_squared': tau2,
+        'Q_statistic': Q,
+        'Q_df': df,
+        'Q_pvalue': 1 - stats.chi2.cdf(Q, df),
+        'I_squared': I2,
+        'H_squared': H2,
+        'interpretation': (
+            f"I-squared = {I2:.1f}%: "
+            + ('low' if I2 < 25 else 'moderate' if I2 < 75 else 'high')
+            + ' heterogeneity'
+        )
+    }
+```
+## Forest Plot
+```python
+import matplotlib.pyplot as plt
+def forest_plot(studies: list[dict], pooled: dict,
+                title: str = 'Forest Plot') -> plt.Figure:
+    """
+    Create a publication-quality forest plot.
+    Args:
+        studies: List of dicts with 'name', 'effect', 'ci_lower', 'ci_upper', 'weight'
+        pooled: Dict with 'pooled_effect', 'ci_95'
+    """
+    fig, ax = plt.subplots(figsize=(10, max(6, len(studies)*0.5)))
+    k = len(studies)
+    for i, study in enumerate(studies):
+        y = k - i
+        ax.plot([study['ci_lower'], study['ci_upper']], [y, y], 'b-', linewidth=1)
+        size = study.get('weight', 5) * 2
+        ax.plot(study['effect'], y, 'bs', markersize=max(3, min(size, 15)))
+        ax.text(-0.05, y, study['name'], ha='right', va='center', fontsize=9,
+                transform=ax.get_yaxis_transform())
+    # Pooled estimate (diamond)
+    pe = pooled['pooled_effect']
+    ci = pooled['ci_95']
+    ax.fill([ci[0], pe, ci[1], pe], [0.3, 0.6, 0.3, 0], 'r', alpha=0.7)
+    ax.axvline(x=0, color='gray', linestyle='--', linewidth=0.5)
+    ax.set_xlabel('Effect Size (Hedges g)')
+    ax.set_title(title)
+    ax.set_yticks([])
+    plt.tight_layout()
+    return fig
+```
+## Publication Bias Assessment
+Methods to assess and address publication bias:
+1. **Funnel plot**: Visual inspection for asymmetry
+2. **Egger's test**: Regression test for funnel plot asymmetry (p < 0.10 suggests bias)
+3. **Trim-and-fill**: Imputes missing studies to correct for bias
+4. **p-curve analysis**: Tests whether significant results contain evidential value
+5. **Selection models**: Formally model the publication process (e.g., Vevea-Hedges)
+## Reporting Standards
+Follow PRISMA 2020 guidelines for reporting:
+- Report all effect sizes with 95% CIs
+- Report Q, I-squared, and tau-squared for heterogeneity
+- Include forest plots for all primary outcomes
+- Report funnel plots and publication bias tests
+- Provide subgroup analyses and sensitivity analyses (leave-one-out)
+- Register the protocol on PROSPERO before conducting the review

package/skills/analysis/statistics/nonparametric-tests-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,221 @@
+---
+name: nonparametric-tests-guide
+description: "Apply Mann-Whitney, Kruskal-Wallis, and other nonparametric methods"
+metadata:
+  openclaw:
+    emoji: "chart_with_upwards_trend"
+    category: "analysis"
+    subcategory: "statistics"
+    keywords: ["nonparametric tests", "Mann-Whitney", "Kruskal-Wallis", "Wilcoxon", "rank-based tests", "distribution-free"]
+    source: "wentor-research-plugins"
+---
+# Nonparametric Tests Guide
+A skill for selecting and applying nonparametric statistical tests when data violate parametric assumptions. Covers rank-based tests for group comparisons, correlation, and paired data, with implementation examples and guidance on reporting.
+## When to Use Nonparametric Tests
+### Decision Criteria
+```
+Use nonparametric tests when:
+  - Data are ordinal (Likert scales, rankings)
+  - Distribution is clearly non-normal (heavy skew, outliers)
+  - Sample size is very small (n < 15-20 per group)
+  - Homogeneity of variance is violated
+  - You are analyzing ranks or medians rather than means
+Use parametric tests when:
+  - Data are approximately normal (or n > 30 by CLT)
+  - Variance is homogeneous across groups
+  - You need greater statistical power
+  - The parametric assumptions are reasonably met
+```
+### Test Selection Guide
+| Parametric Test | Nonparametric Alternative | Use Case |
+|----------------|--------------------------|----------|
+| Independent t-test | Mann-Whitney U | Compare 2 independent groups |
+| Paired t-test | Wilcoxon signed-rank | Compare 2 related samples |
+| One-way ANOVA | Kruskal-Wallis H | Compare 3+ independent groups |
+| Repeated measures ANOVA | Friedman test | Compare 3+ related samples |
+| Pearson correlation | Spearman rank correlation | Measure association |
+| Chi-square test | Fisher's exact test | Compare proportions (small n) |
+## Mann-Whitney U Test
+### Two Independent Groups
+```python
+from scipy import stats
+import numpy as np
+def mann_whitney_test(group_a: list, group_b: list) -> dict:
+    """
+    Perform Mann-Whitney U test for two independent groups.
+    Args:
+        group_a: Observations from group A
+        group_b: Observations from group B
+    """
+    statistic, p_value = stats.mannwhitneyu(
+        group_a, group_b, alternative="two-sided"
+    )
+    n_a, n_b = len(group_a), len(group_b)
+    # Rank-biserial correlation as effect size
+    r = 1 - (2 * statistic) / (n_a * n_b)
+    return {
+        "U_statistic": statistic,
+        "p_value": p_value,
+        "n_a": n_a,
+        "n_b": n_b,
+        "median_a": np.median(group_a),
+        "median_b": np.median(group_b),
+        "effect_size_r": abs(r),
+        "effect_interpretation": (
+            "small" if abs(r) < 0.3
+            else "medium" if abs(r) < 0.5
+            else "large"
+        )
+    }
+# Example usage
+control = [12, 15, 14, 10, 13, 11, 16, 9, 14, 12]
+treatment = [18, 22, 19, 17, 20, 21, 16, 23, 19, 20]
+result = mann_whitney_test(control, treatment)
+print(f"U = {result['U_statistic']}, p = {result['p_value']:.4f}")
+print(f"Effect size r = {result['effect_size_r']:.3f} ({result['effect_interpretation']})")
+```
+## Kruskal-Wallis H Test
+### Three or More Independent Groups
+```python
+def kruskal_wallis_with_posthoc(*groups) -> dict:
+    """
+    Perform Kruskal-Wallis test with Dunn's post-hoc comparisons.
+    Args:
+        *groups: Variable number of group data arrays
+    """
+    # Omnibus test
+    h_stat, p_value = stats.kruskal(*groups)
+    result = {
+        "H_statistic": h_stat,
+        "p_value": p_value,
+        "n_groups": len(groups),
+        "group_medians": [np.median(g) for g in groups]
+    }
+    # If significant, perform pairwise Mann-Whitney with Bonferroni correction
+    if p_value < 0.05:
+        n_comparisons = len(groups) * (len(groups) - 1) // 2
+        pairwise = []
+        for i in range(len(groups)):
+            for j in range(i + 1, len(groups)):
+                u, p = stats.mannwhitneyu(groups[i], groups[j])
+                pairwise.append({
+                    "comparison": f"Group {i+1} vs Group {j+1}",
+                    "U": u,
+                    "p_raw": p,
+                    "p_adjusted": min(p * n_comparisons, 1.0),
+                    "significant": (p * n_comparisons) < 0.05
+                })
+        result["posthoc"] = pairwise
+    return result
+```
+## Wilcoxon Signed-Rank Test
+### Paired or Repeated Measures
+```python
+def wilcoxon_signed_rank(before: list, after: list) -> dict:
+    """
+    Perform Wilcoxon signed-rank test for paired data.
+    Args:
+        before: Pre-intervention measurements
+        after: Post-intervention measurements
+    """
+    statistic, p_value = stats.wilcoxon(before, after)
+    n = len(before)
+    # Effect size: r = Z / sqrt(N)
+    z_score = stats.norm.ppf(1 - p_value / 2)
+    r = z_score / np.sqrt(n)
+    differences = [a - b for a, b in zip(after, before)]
+    return {
+        "W_statistic": statistic,
+        "p_value": p_value,
+        "n_pairs": n,
+        "median_difference": np.median(differences),
+        "effect_size_r": abs(r)
+    }
+```
+## Spearman Rank Correlation
+### Monotonic Association
+```python
+def spearman_correlation(x: list, y: list) -> dict:
+    """
+    Compute Spearman rank correlation.
+    """
+    rho, p_value = stats.spearmanr(x, y)
+    return {
+        "rho": rho,
+        "p_value": p_value,
+        "interpretation": (
+            "negligible" if abs(rho) < 0.1
+            else "weak" if abs(rho) < 0.3
+            else "moderate" if abs(rho) < 0.5
+            else "strong" if abs(rho) < 0.7
+            else "very strong"
+        )
+    }
+```
+## Reporting Nonparametric Results
+### APA-Style Reporting Examples
+```
+Mann-Whitney U:
+  "A Mann-Whitney U test indicated that treatment scores
+   (Mdn = 20.0) were significantly higher than control scores
+   (Mdn = 13.0), U = 5.0, p < .001, r = .82."
+Kruskal-Wallis:
+  "A Kruskal-Wallis H test showed a significant difference
+   in scores across the three conditions, H(2) = 15.32,
+   p < .001. Post-hoc pairwise comparisons with Bonferroni
+   correction revealed..."
+Wilcoxon Signed-Rank:
+  "A Wilcoxon signed-rank test showed that the intervention
+   significantly improved scores (Mdn_diff = 4.5),
+   W = 12.0, p = .003, r = .58."
+Spearman:
+  "There was a strong positive correlation between X and Y,
+   r_s = .72, p < .001."
+```
+### Effect Size Guidelines
+Always report effect sizes alongside p-values. For rank-biserial correlation r: small (0.1), medium (0.3), large (0.5). For Spearman rho, use standard correlation benchmarks. Effect sizes allow readers to judge practical significance independent of sample size.