npm - @wentorai/research-plugins - Versions diffs - 1.0.0 - Mend

@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (252) hide show

package/skills/analysis/dataviz/publication-figures-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,238 @@
+---
+name: publication-figures-guide
+description: "Create journal-quality scientific figures with proper styling and accessibility"
+metadata:
+  openclaw:
+    emoji: "art"
+    category: "analysis"
+    subcategory: "dataviz"
+    keywords: ["scientific figure creation", "publication quality figure", "figure standards", "colorblind-friendly palette", "data visualization"]
+    source: "wentor"
+---
+# Publication Figures Guide
+A skill for creating publication-quality scientific figures that meet journal standards for resolution, formatting, accessibility, and visual clarity. Covers matplotlib, seaborn, and ggplot2 workflows with journal-ready export settings.
+## Journal Figure Requirements
+### Common Standards
+| Requirement | Typical Spec | Notes |
+|------------|-------------|-------|
+| Resolution | 300-600 DPI | 300 DPI minimum for print |
+| File format | PDF, EPS, TIFF | Vector (PDF/EPS) preferred |
+| Color mode | CMYK for print, RGB for online | Check journal spec |
+| Max width | Single column: 3.3in / Double: 6.7in | Varies by journal |
+| Font size | 6-8pt minimum | Must be legible at final print size |
+| Line width | 0.5-1.5pt | Thin lines may not reproduce |
+| File size | Varies (often <10MB per figure) | TIFF can be large |
+### Matplotlib Configuration for Publication
+```python
+import matplotlib.pyplot as plt
+import matplotlib as mpl
+import numpy as np
+def setup_publication_style(journal: str = 'nature'):
+    """
+    Configure matplotlib for publication-quality figures.
+    """
+    styles = {
+        'nature': {
+            'figure.figsize': (3.3, 2.5),    # single column
+            'font.size': 7,
+            'font.family': 'sans-serif',
+            'font.sans-serif': ['Arial', 'Helvetica'],
+            'axes.linewidth': 0.5,
+            'axes.labelsize': 8,
+            'xtick.labelsize': 7,
+            'ytick.labelsize': 7,
+            'legend.fontsize': 6,
+            'lines.linewidth': 1.0,
+            'lines.markersize': 4,
+            'savefig.dpi': 300,
+            'savefig.bbox': 'tight',
+            'savefig.pad_inches': 0.05,
+        },
+        'ieee': {
+            'figure.figsize': (3.5, 2.6),
+            'font.size': 8,
+            'font.family': 'serif',
+            'font.serif': ['Times New Roman', 'Times'],
+            'axes.linewidth': 0.5,
+            'axes.labelsize': 9,
+            'xtick.labelsize': 8,
+            'ytick.labelsize': 8,
+            'legend.fontsize': 7,
+            'lines.linewidth': 1.0,
+            'savefig.dpi': 300,
+        },
+        'acs': {
+            'figure.figsize': (3.25, 2.5),
+            'font.size': 7,
+            'font.family': 'sans-serif',
+            'font.sans-serif': ['Arial'],
+            'axes.linewidth': 0.5,
+            'savefig.dpi': 600,
+        }
+    }
+    style = styles.get(journal, styles['nature'])
+    mpl.rcParams.update(style)
+    return style
+setup_publication_style('nature')
+```
+## Colorblind-Friendly Palettes
+### Recommended Color Schemes
+```python
+def get_accessible_palette(n_colors: int = 8, style: str = 'categorical') -> list:
+    """
+    Return colorblind-friendly palettes.
+    """
+    palettes = {
+        'categorical': {
+            # Wong (2011) Nature Methods palette
+            3: ['#0072B2', '#D55E00', '#009E73'],
+            4: ['#0072B2', '#D55E00', '#009E73', '#CC79A7'],
+            5: ['#0072B2', '#D55E00', '#009E73', '#CC79A7', '#F0E442'],
+            8: ['#0072B2', '#D55E00', '#009E73', '#CC79A7',
+                '#F0E442', '#56B4E9', '#E69F00', '#000000']
+        },
+        'sequential': {
+            # Viridis-based (perceptually uniform)
+            'cmap': 'viridis'  # Also: 'cividis', 'inferno', 'magma'
+        },
+        'diverging': {
+            'cmap': 'RdBu_r'  # Also: 'coolwarm', 'BrBG'
+        }
+    }
+    if style == 'categorical':
+        n = min(n_colors, 8)
+        return palettes['categorical'].get(n, palettes['categorical'][8][:n])
+    else:
+        return palettes[style]
+# Usage
+colors = get_accessible_palette(4)
+```
+## Common Figure Types
+### Bar Charts with Error Bars
+```python
+def publication_barplot(data: dict, ylabel: str, title: str = '',
+                         output: str = 'figure.pdf'):
+    """
+    Create a publication-quality bar chart.
+    Args:
+        data: Dict mapping group names to (mean, std_error) tuples
+    """
+    setup_publication_style('nature')
+    colors = get_accessible_palette(len(data))
+    fig, ax = plt.subplots()
+    x = np.arange(len(data))
+    names = list(data.keys())
+    means = [data[k][0] for k in names]
+    errors = [data[k][1] for k in names]
+    bars = ax.bar(x, means, yerr=errors, capsize=3, color=colors,
+                  edgecolor='black', linewidth=0.5, width=0.6,
+                  error_kw={'linewidth': 0.5})
+    ax.set_xticks(x)
+    ax.set_xticklabels(names, rotation=0)
+    ax.set_ylabel(ylabel)
+    if title:
+        ax.set_title(title)
+    # Remove top and right spines
+    ax.spines['top'].set_visible(False)
+    ax.spines['right'].set_visible(False)
+    fig.savefig(output, dpi=300, bbox_inches='tight')
+    plt.close()
+    return output
+```
+### Scatter Plots with Regression Lines
+```python
+from scipy import stats
+def publication_scatter(x, y, xlabel, ylabel, output='scatter.pdf',
+                         groups=None, group_labels=None):
+    """Publication-quality scatter plot with optional regression line."""
+    setup_publication_style('nature')
+    fig, ax = plt.subplots()
+    if groups is None:
+        ax.scatter(x, y, s=15, alpha=0.7, color='#0072B2', edgecolors='none')
+        # Regression line
+        slope, intercept, r, p, se = stats.linregress(x, y)
+        x_fit = np.linspace(min(x), max(x), 100)
+        ax.plot(x_fit, slope*x_fit + intercept, '--', color='#D55E00', linewidth=0.8)
+        ax.text(0.05, 0.95, f'r = {r:.2f}, p = {p:.3f}',
+                transform=ax.transAxes, fontsize=6, va='top')
+    else:
+        colors = get_accessible_palette(len(set(groups)))
+        for i, label in enumerate(group_labels or sorted(set(groups))):
+            mask = np.array(groups) == label
+            ax.scatter(np.array(x)[mask], np.array(y)[mask],
+                      s=15, alpha=0.7, color=colors[i], label=label)
+        ax.legend(frameon=False)
+    ax.set_xlabel(xlabel)
+    ax.set_ylabel(ylabel)
+    ax.spines['top'].set_visible(False)
+    ax.spines['right'].set_visible(False)
+    fig.savefig(output, dpi=300, bbox_inches='tight')
+    plt.close()
+```
+## Multi-Panel Figures
+```python
+def multi_panel_figure(n_rows, n_cols, panel_data, output='multipanel.pdf'):
+    """Create a multi-panel figure with automatic panel labels."""
+    setup_publication_style('nature')
+    fig, axes = plt.subplots(n_rows, n_cols,
+                              figsize=(3.3*n_cols, 2.5*n_rows))
+    if n_rows * n_cols == 1:
+        axes = np.array([axes])
+    axes = axes.flatten()
+    labels = 'abcdefghijklmnopqrstuvwxyz'
+    for i, ax in enumerate(axes[:len(panel_data)]):
+        # Add panel label
+        ax.text(-0.15, 1.05, labels[i], transform=ax.transAxes,
+                fontsize=10, fontweight='bold', va='bottom')
+    plt.tight_layout()
+    fig.savefig(output, dpi=300, bbox_inches='tight')
+    plt.close()
+```
+## Export Best Practices
+1. **Vector formats first**: Use PDF or EPS for line art and charts; TIFF only for photographs
+2. **Font embedding**: Ensure all fonts are embedded (use `plt.rcParams['pdf.fonttype'] = 42`)
+3. **Check at print size**: View the figure at actual print size (3.3in wide) to verify readability
+4. **CMYK conversion**: For print journals, convert RGB to CMYK using ImageMagick or Photoshop
+5. **Consistent styling**: All figures in a paper should use the same fonts, colors, and styling
+```python
+# Ensure fonts are embedded in PDF output
+mpl.rcParams['pdf.fonttype'] = 42  # TrueType fonts
+mpl.rcParams['ps.fonttype'] = 42
+```

package/skills/analysis/dataviz/python-dataviz-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,195 @@
+---
+name: python-dataviz-guide
+description: "Publication-quality data visualization with matplotlib, seaborn, and plotly"
+metadata:
+  openclaw:
+    emoji: "📊"
+    category: "analysis"
+    subcategory: "dataviz"
+    keywords: ["data visualization", "chart design", "Python dataviz", "scientific figure creation", "publication quality figure"]
+    source: "N/A"
+---
+# Python Data Visualization Guide
+## Overview
+Data visualization is how researchers communicate quantitative findings. A well-designed figure can convey complex relationships instantly, while a poor one buries the signal in clutter. Python's visualization ecosystem -- anchored by matplotlib, seaborn, and plotly -- provides everything needed to produce publication-quality figures for journals, conferences, and presentations.
+This guide covers the three major Python visualization libraries, their strengths and trade-offs, and concrete recipes for the chart types researchers use most frequently. Each example is designed to be copy-paste ready and customizable for your specific dataset and venue requirements.
+The emphasis is on producing figures that meet journal standards: correct DPI, appropriate font sizes, accessible color palettes, and vector-format exports. We also cover interactive visualization with plotly for exploratory analysis and supplementary materials.
+## Matplotlib: The Foundation
+Matplotlib is the most flexible Python plotting library. Nearly every other visualization tool in the Python ecosystem builds on it.
+### Setting Up Publication Defaults
+```python
+import matplotlib.pyplot as plt
+import matplotlib as mpl
+# Publication-quality defaults
+plt.rcParams.update({
+    'figure.figsize': (6, 4),
+    'figure.dpi': 150,
+    'savefig.dpi': 300,
+    'savefig.bbox': 'tight',
+    'font.size': 11,
+    'font.family': 'serif',
+    'font.serif': ['Times New Roman'],
+    'axes.labelsize': 12,
+    'axes.titlesize': 13,
+    'xtick.labelsize': 10,
+    'ytick.labelsize': 10,
+    'legend.fontsize': 10,
+    'lines.linewidth': 1.5,
+    'lines.markersize': 6,
+    'axes.grid': True,
+    'grid.alpha': 0.3,
+})
+```
+### Line Plot with Error Bands
+```python
+import numpy as np
+epochs = np.arange(1, 51)
+acc_mean = 1 - 0.5 * np.exp(-epochs / 10)
+acc_std = 0.03 * np.exp(-epochs / 20)
+fig, ax = plt.subplots()
+ax.plot(epochs, acc_mean, label='Our Method', color='#2563EB')
+ax.fill_between(epochs, acc_mean - acc_std, acc_mean + acc_std,
+                alpha=0.2, color='#2563EB')
+ax.set_xlabel('Epoch')
+ax.set_ylabel('Accuracy')
+ax.set_ylim(0.4, 1.0)
+ax.legend(frameon=False)
+fig.savefig('accuracy_curve.pdf')  # Vector format for papers
+```
+### Multi-Panel Figures
+```python
+fig, axes = plt.subplots(1, 3, figsize=(15, 4), sharey=True)
+for ax, dataset, color in zip(axes, ['CIFAR-10', 'ImageNet', 'COCO'],
+                                ['#2563EB', '#DC2626', '#16A34A']):
+    x = np.random.randn(200)
+    ax.hist(x, bins=30, color=color, alpha=0.7, edgecolor='white')
+    ax.set_title(dataset)
+    ax.set_xlabel('Score Distribution')
+axes[0].set_ylabel('Count')
+plt.tight_layout()
+fig.savefig('multi_panel.pdf')
+```
+## Seaborn: Statistical Visualization
+Seaborn excels at statistical graphics with minimal code. It handles data frames natively and produces polished output by default.
+### Comparison Bar Chart with Significance
+```python
+import seaborn as sns
+import pandas as pd
+data = pd.DataFrame({
+    'Method': ['Baseline', 'Baseline', 'Ours', 'Ours', 'Ours+FT', 'Ours+FT'],
+    'Metric': ['BLEU', 'ROUGE'] * 3,
+    'Score': [34.2, 45.1, 41.8, 52.3, 48.5, 58.7]
+})
+fig, ax = plt.subplots(figsize=(8, 5))
+sns.barplot(data=data, x='Metric', y='Score', hue='Method',
+            palette=['#94A3B8', '#3B82F6', '#EF4444'], ax=ax)
+ax.set_ylabel('Score')
+ax.legend(title='Method', frameon=False)
+fig.savefig('comparison.pdf')
+```
+### Correlation Heatmap
+```python
+corr_matrix = pd.DataFrame(
+    np.random.randn(8, 8),
+    columns=[f'Feature {i}' for i in range(8)]
+).corr()
+fig, ax = plt.subplots(figsize=(8, 7))
+sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='RdBu_r',
+            center=0, square=True, linewidths=0.5, ax=ax)
+ax.set_title('Feature Correlation Matrix')
+fig.savefig('heatmap.pdf')
+```
+### Violin Plot for Distribution Comparison
+```python
+df = pd.DataFrame({
+    'Group': np.repeat(['Control', 'Treatment A', 'Treatment B'], 100),
+    'Value': np.concatenate([
+        np.random.normal(50, 10, 100),
+        np.random.normal(55, 8, 100),
+        np.random.normal(60, 12, 100)
+    ])
+})
+fig, ax = plt.subplots(figsize=(8, 5))
+sns.violinplot(data=df, x='Group', y='Value', palette='Set2',
+               inner='box', ax=ax)
+ax.set_ylabel('Measurement')
+fig.savefig('violin.pdf')
+```
+## Plotly: Interactive Visualization
+Plotly is ideal for exploratory analysis and HTML-based supplementary materials.
+```python
+import plotly.express as px
+df = px.data.gapminder().query("year == 2007")
+fig = px.scatter(df, x="gdpPercap", y="lifeExp",
+                 size="pop", color="continent",
+                 hover_name="country",
+                 log_x=True, size_max=60,
+                 title="GDP vs Life Expectancy (2007)")
+fig.write_html("interactive_scatter.html")
+fig.write_image("scatter.pdf")  # Requires kaleido
+```
+## Chart Type Selection Guide
+| Data Relationship | Recommended Chart | Library |
+|-------------------|-------------------|---------|
+| Trend over time | Line plot | matplotlib |
+| Distribution | Histogram, violin, box | seaborn |
+| Comparison (categories) | Bar chart, grouped bar | seaborn |
+| Correlation (2 vars) | Scatter plot | matplotlib/plotly |
+| Correlation (matrix) | Heatmap | seaborn |
+| Part-to-whole | Stacked bar (not pie) | matplotlib |
+| High-dimensional | PCA/t-SNE scatter | plotly |
+| Geospatial | Choropleth | plotly |
+## Best Practices
+- **Export as PDF or SVG for print, PNG at 300 DPI as fallback.** Never submit JPEG figures to journals.
+- **Use colorblind-safe palettes.** `sns.color_palette("colorblind")` or use tools like ColorBrewer.
+- **Label everything.** Axes, legends, and units should be readable without referring to the caption.
+- **Avoid chartjunk.** Remove unnecessary gridlines, borders, and decorative elements.
+- **Match the figure width to the journal column width.** Single-column is typically 3.3 inches; double-column is 6.9 inches.
+- **Use consistent styling across all figures in a paper.** Define a style dictionary once and reuse it.
+- **Include error bars or confidence intervals.** Raw point estimates without uncertainty are incomplete.
+## References
+- [matplotlib Documentation](https://matplotlib.org/stable/) -- Official reference
+- [seaborn Documentation](https://seaborn.pydata.org/) -- Statistical visualization
+- [plotly Documentation](https://plotly.com/python/) -- Interactive charts
+- [Scientific Visualization: Python + Matplotlib](https://github.com/rougier/scientific-visualization-book) -- Nicolas Rougier
+- [Ten Simple Rules for Better Figures](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003833) -- Rougier et al.

package/skills/analysis/econometrics/causal-inference-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,197 @@
+---
+name: causal-inference-guide
+description: "Causal inference methods including DiD, IV, RDD, and synthetic control"
+metadata:
+  openclaw:
+    emoji: "link"
+    category: "analysis"
+    subcategory: "econometrics"
+    keywords: ["difference-in-differences", "DiD", "causal inference", "instrumental variables", "IV estimation", "regression discontinuity"]
+    source: "wentor"
+---
+# Causal Inference Guide
+A skill for applying quasi-experimental causal inference methods in observational research. Covers difference-in-differences, instrumental variables, regression discontinuity designs, and synthetic control methods with implementation code and diagnostic checks.
+## Difference-in-Differences (DiD)
+### Classic Two-Period DiD
+```python
+import numpy as np
+import pandas as pd
+import statsmodels.formula.api as smf
+def did_estimation(df: pd.DataFrame, outcome: str, treatment: str,
+                    post: str, covariates: list[str] = None) -> dict:
+    """
+    Estimate a difference-in-differences model.
+    Args:
+        df: Panel DataFrame
+        outcome: Name of outcome variable column
+        treatment: Name of treatment group indicator (0/1)
+        post: Name of post-treatment period indicator (0/1)
+        covariates: Optional list of control variable names
+    """
+    # Create interaction term
+    df = df.copy()
+    df['did'] = df[treatment] * df[post]
+    # Build formula
+    formula = f"{outcome} ~ {treatment} + {post} + did"
+    if covariates:
+        formula += ' + ' + ' + '.join(covariates)
+    model = smf.ols(formula, data=df).fit(cov_type='cluster',
+                                           cov_kwds={'groups': df.get('unit_id', df.index)})
+    return {
+        'did_estimate': model.params['did'],
+        'se': model.bse['did'],
+        'p_value': model.pvalues['did'],
+        'ci_95': (model.conf_int().loc['did', 0], model.conf_int().loc['did', 1]),
+        'r_squared': model.rsquared,
+        'n_obs': model.nobs,
+        'interpretation': (
+            f"The treatment effect is {model.params['did']:.3f} "
+            f"(SE = {model.bse['did']:.3f}, p = {model.pvalues['did']:.4f}). "
+            f"{'Statistically significant' if model.pvalues['did'] < 0.05 else 'Not significant'} "
+            f"at the 5% level."
+        )
+    }
+```
+### Parallel Trends Test
+The key identifying assumption. Test it with pre-treatment data:
+```python
+def test_parallel_trends(df: pd.DataFrame, outcome: str,
+                          treatment: str, time: str,
+                          treatment_period: int) -> dict:
+    """
+    Test the parallel trends assumption using event study specification.
+    """
+    df = df.copy()
+    pre_periods = sorted(df[df[time] < treatment_period][time].unique())
+    # Create period dummies interacted with treatment
+    for t in pre_periods:
+        df[f'pre_{t}'] = ((df[time] == t) & (df[treatment] == 1)).astype(int)
+    period_vars = [f'pre_{t}' for t in pre_periods[:-1]]  # omit last pre-period (reference)
+    formula = f"{outcome} ~ {' + '.join(period_vars)} + C({time}) + C(unit_id)"
+    model = smf.ols(formula, data=df).fit()
+    # Joint F-test: all pre-treatment interactions = 0
+    f_test = model.f_test(' = '.join([f'{v} = 0' for v in period_vars]))
+    return {
+        'pre_period_coefficients': {v: model.params[v] for v in period_vars},
+        'f_statistic': f_test.fvalue[0][0],
+        'f_pvalue': f_test.pvalue,
+        'parallel_trends_hold': f_test.pvalue > 0.05,
+        'interpretation': (
+            'Parallel trends assumption supported (cannot reject joint null)'
+            if f_test.pvalue > 0.05
+            else 'WARNING: Parallel trends assumption may be violated'
+        )
+    }
+```
+## Instrumental Variables (IV)
+### Two-Stage Least Squares
+```python
+from linearmodels.iv import IV2SLS
+def iv_estimation(df: pd.DataFrame, outcome: str, endogenous: str,
+                   instrument: str, exogenous: list[str] = None) -> dict:
+    """
+    Estimate an IV model using 2SLS.
+    Args:
+        outcome: Dependent variable
+        endogenous: Endogenous regressor
+        instrument: Instrumental variable
+        exogenous: List of exogenous control variables
+    """
+    exog_formula = '1'
+    if exogenous:
+        exog_formula += ' + ' + ' + '.join(exogenous)
+    model = IV2SLS(
+        dependent=df[outcome],
+        exog=df[exogenous] if exogenous else None,
+        endog=df[[endogenous]],
+        instruments=df[[instrument]]
+    ).fit(cov_type='robust')
+    # First-stage F-statistic
+    first_stage = smf.ols(f"{endogenous} ~ {instrument}", data=df).fit()
+    f_stat = first_stage.fvalue
+    return {
+        'iv_estimate': model.params[endogenous],
+        'se': model.std_errors[endogenous],
+        'p_value': model.pvalues[endogenous],
+        'first_stage_F': f_stat,
+        'weak_instrument': f_stat < 10,  # Stock-Yogo rule of thumb
+        'interpretation': (
+            f"IV estimate: {model.params[endogenous]:.3f}. "
+            f"First-stage F = {f_stat:.1f} "
+            f"({'Strong' if f_stat >= 10 else 'WEAK'} instrument)."
+        )
+    }
+```
+### IV Diagnostic Checklist
+1. **Relevance**: First-stage F > 10 (Stock & Yogo, 2005)
+2. **Exclusion restriction**: Instrument affects outcome only through the endogenous variable (untestable, argue conceptually)
+3. **Overidentification test**: Sargan/Hansen J-test when you have more instruments than endogenous variables
+## Regression Discontinuity Design (RDD)
+```python
+def rdd_estimation(df: pd.DataFrame, outcome: str, running_var: str,
+                    cutoff: float, bandwidth: float = None) -> dict:
+    """
+    Sharp regression discontinuity design estimation.
+    """
+    df = df.copy()
+    df['centered'] = df[running_var] - cutoff
+    df['treated'] = (df[running_var] >= cutoff).astype(int)
+    if bandwidth is None:
+        bandwidth = df['centered'].std()  # simple default
+    # Restrict to bandwidth
+    local = df[df['centered'].abs() <= bandwidth]
+    # Local linear regression
+    formula = f"{outcome} ~ treated * centered"
+    model = smf.ols(formula, data=local).fit(cov_type='HC1')
+    return {
+        'rdd_estimate': model.params['treated'],
+        'se': model.bse['treated'],
+        'p_value': model.pvalues['treated'],
+        'bandwidth': bandwidth,
+        'n_obs': len(local),
+        'n_treated': local['treated'].sum(),
+        'n_control': len(local) - local['treated'].sum()
+    }
+```
+## Best Practices
+- Always visualize your data: plot outcome trends over time (DiD), first-stage relationships (IV), or running variable distributions (RDD)
+- Report robustness checks: varying bandwidths, alternative specifications, placebo tests
+- Use cluster-robust standard errors at the appropriate level (usually the treatment unit level)
+- Be transparent about identifying assumptions and potential violations
+- Pre-register your analysis plan when possible to avoid p-hacking concerns