npm - @wentorai/research-plugins - Versions diffs - 1.2.2 → 1.3.0 - Mend

@wentorai/research-plugins 1.2.2 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (141) hide show

package/skills/analysis/statistics/quantitative-methods-guide/SKILL.md DELETED Viewed

@@ -1,193 +0,0 @@
----
-name: quantitative-methods-guide
-description: "Design and execute statistical analyses with regression modeling"
-metadata:
-  openclaw:
-    emoji: "📈"
-    category: "analysis"
-    subcategory: "statistics"
-    keywords: ["regression analysis", "quantitative methods", "research design", "statistical modeling", "OLS", "logistic regression"]
-    source: "https://github.com/AcademicSkills/quantitative-methods-guide"
----
-# Quantitative Methods Guide
-A skill for designing and executing rigorous quantitative analyses in academic research. Covers the full pipeline from research question formulation through variable operationalization, model specification, estimation, diagnostics, and interpretation, with emphasis on regression modeling as the workhorse of empirical research.
-## Overview
-Quantitative methods form the foundation of empirical research across the social sciences, health sciences, economics, education, and many STEM fields. This skill provides a structured approach to the entire quantitative analysis workflow, ensuring that researchers make methodologically sound choices at each stage. It treats regression analysis as the central tool, covering ordinary least squares (OLS), logistic regression, Poisson regression, and multilevel models, while also addressing the broader issues of research design, measurement, and causal inference that determine whether regression results are meaningful.
-The skill is designed for graduate students and researchers who have basic statistics knowledge but need guidance on applying methods correctly in their own research contexts.
-## Research Design and Variable Specification
-### From Question to Model
-```
-Research Question: "Does mentoring frequency affect publication output among
-                    junior faculty, controlling for department size and funding?"
-Step 1: Identify variables
-  - Outcome (Y): publication_count (count data)
-  - Predictor (X1): mentoring_hours_per_month (continuous)
-  - Controls: department_size (continuous), total_funding (continuous)
-  - Potential moderator: career_stage (categorical: assistant/associate)
-Step 2: Choose model family
-  - Count outcome → Poisson or Negative Binomial regression
-  - Check for overdispersion before deciding
-Step 3: Specify the model
-  publications ~ mentoring_hours + department_size + log(funding) + career_stage
-  Optional: publications ~ mentoring_hours * career_stage + controls (interaction)
-```
-### Variable Types and Measurement
-| Variable Type | Examples | Modeling Considerations |
-|--------------|----------|----------------------|
-| Continuous | Income, GPA, temperature | Check distribution, consider transformations |
-| Binary | Pass/fail, treatment/control | Logistic regression |
-| Count | Publications, citations, events | Poisson or negative binomial |
-| Ordinal | Likert scales, rankings | Ordinal logistic or treat as continuous if 5+ levels |
-| Nominal | Department, country, method | Dummy coding (k-1 indicators) |
-| Time-to-event | Months until graduation | Survival analysis |
-## Regression Analysis
-### Ordinary Least Squares (OLS)
-```python
-import statsmodels.formula.api as smf
-import pandas as pd
-def run_ols_analysis(df: pd.DataFrame, formula: str) -> dict:
-    """
-    Fit an OLS regression model with full diagnostics.
-    Args:
-        df: DataFrame with all variables
-        formula: Patsy formula (e.g., 'y ~ x1 + x2 + C(group)')
-    """
-    model = smf.ols(formula=formula, data=df).fit(cov_type='HC3')  # robust SE
-    results = {
-        'coefficients': model.params.to_dict(),
-        'std_errors': model.bse.to_dict(),
-        'p_values': model.pvalues.to_dict(),
-        'conf_int': model.conf_int().to_dict(),
-        'r_squared': model.rsquared,
-        'adj_r_squared': model.rsquared_adj,
-        'f_statistic': model.fvalue,
-        'f_pvalue': model.f_pvalue,
-        'n_obs': int(model.nobs),
-        'aic': model.aic,
-        'bic': model.bic
-    }
-    return results
-# Example usage:
-# results = run_ols_analysis(df, 'gpa ~ study_hours + sleep_hours + C(major)')
-```
-### Logistic Regression
-```python
-def run_logistic_analysis(df: pd.DataFrame, formula: str) -> dict:
-    """
-    Fit a logistic regression for binary outcomes.
-    Reports odds ratios alongside coefficients.
-    """
-    model = smf.logit(formula=formula, data=df).fit(disp=False)
-    import numpy as np
-    results = {
-        'coefficients': model.params.to_dict(),
-        'odds_ratios': np.exp(model.params).to_dict(),
-        'p_values': model.pvalues.to_dict(),
-        'conf_int_OR': np.exp(model.conf_int()).to_dict(),
-        'pseudo_r_squared': model.prsquared,
-        'log_likelihood': model.llf,
-        'aic': model.aic,
-        'n_obs': int(model.nobs)
-    }
-    return results
-```
-## Model Diagnostics
-### OLS Assumption Checks
-Run these diagnostics after fitting any OLS model:
-1. **Linearity**: Plot residuals vs. fitted values. Look for no systematic pattern.
-2. **Normality of residuals**: Q-Q plot and Shapiro-Wilk test on residuals.
-3. **Homoscedasticity**: Breusch-Pagan test (`statsmodels.stats.diagnostic.het_breuschpagan`).
-4. **No multicollinearity**: Variance Inflation Factor (VIF) for each predictor.
-5. **Independence**: Durbin-Watson statistic for autocorrelation (especially panel/time data).
-```python
-from statsmodels.stats.outliers_influence import variance_inflation_factor
-from statsmodels.stats.diagnostic import het_breuschpagan
-def check_ols_assumptions(model, X_matrix) -> dict:
-    """
-    Run standard OLS diagnostic tests.
-    """
-    residuals = model.resid
-    fitted = model.fittedvalues
-    # VIF for multicollinearity
-    vif = {X_matrix.columns[i]: variance_inflation_factor(X_matrix.values, i)
-           for i in range(X_matrix.shape[1])}
-    multicollinearity_flag = any(v > 10 for v in vif.values())
-    # Breusch-Pagan for heteroscedasticity
-    bp_stat, bp_p, _, _ = het_breuschpagan(residuals, X_matrix)
-    from scipy import stats
-    _, normality_p = stats.shapiro(residuals[:5000])  # cap at 5000
-    return {
-        'vif': vif,
-        'multicollinearity_problem': multicollinearity_flag,
-        'breusch_pagan_p': round(bp_p, 4),
-        'heteroscedasticity_problem': bp_p < 0.05,
-        'residual_normality_p': round(normality_p, 4),
-        'recommendation': 'Use HC3 robust standard errors if heteroscedasticity detected'
-    }
-```
-## Reporting Regression Results
-### Standard Regression Table Format
-| Variable | Coefficient | SE | t | p | 95% CI |
-|----------|-----------|------|------|-------|---------|
-| (Intercept) | 2.34 | 0.45 | 5.20 | <.001 | [1.45, 3.23] |
-| Mentoring hours | 0.18 | 0.06 | 3.00 | .003 | [0.06, 0.30] |
-| Dept. size | -0.02 | 0.01 | -2.00 | .048 | [-0.04, -0.00] |
-| Log(Funding) | 0.31 | 0.12 | 2.58 | .011 | [0.07, 0.55] |
-Report: N, R-squared, Adjusted R-squared, F-statistic with df and p-value, and the type of standard errors used (e.g., HC3 robust).
-### Interpretation Template
-"A one-unit increase in [predictor] is associated with a [coefficient] [unit] change in [outcome], holding all other variables constant (b = [coef], SE = [se], p = [p], 95% CI [[lower], [upper]])."
-For logistic regression: "A one-unit increase in [predictor] is associated with [OR]-times higher odds of [outcome] (OR = [or], 95% CI [[lower], [upper]], p = [p])."
-## Common Pitfalls
-- **Omitted variable bias**: Failing to control for confounders that affect both X and Y.
-- **Overfitting**: Including too many predictors relative to sample size (rule of thumb: 10-20 observations per predictor).
-- **p-hacking**: Running many models and reporting only significant results. Pre-register your analysis plan.
-- **Misinterpreting R-squared**: High R-squared does not imply causation; low R-squared does not mean the model is useless.
-- **Ignoring assumptions**: Always run diagnostics. Violated assumptions can invalidate standard errors and p-values.
-## References
-- Wooldridge, J. M. (2019). *Introductory Econometrics* (7th ed.). Cengage.
-- Gelman, A. & Hill, J. (2007). *Data Analysis Using Regression and Multilevel/Hierarchical Models*. Cambridge University Press.
-- King, G. (1986). How Not to Lie with Statistics. *American Journal of Political Science*, 30(3), 666-687.

package/skills/analysis/wrangling/claude-data-analysis-guide/SKILL.md DELETED Viewed

@@ -1,100 +0,0 @@
----
-name: claude-data-analysis-guide
-description: "Claude Code-based conversational data analysis agent"
-metadata:
-  openclaw:
-    emoji: "🔬"
-    category: "analysis"
-    subcategory: "wrangling"
-    keywords: ["Claude Code", "data analysis", "conversational", "pandas", "visualization", "interactive"]
-    source: "https://github.com/liangdabiao/claude-data-analysis"
----
-# Claude Data Analysis Guide
-## Overview
-A Claude Code-based data analysis agent that provides conversational data exploration and analysis. Upload datasets and ask questions in natural language — the agent writes and executes Python code for data cleaning, statistical analysis, visualization, and reporting. Leverages Claude Code's ability to read files, run code, and iterate on results.
-## Setup
-```markdown
-### CLAUDE.md Configuration
-# Data Analysis Project
-## Instructions
-- Analyze data files in the data/ directory
-- Use pandas, numpy, scipy, matplotlib, seaborn
-- Always show data shape and dtypes first
-- Include statistical tests where appropriate
-- Generate publication-quality figures (300 DPI)
-- Save outputs to output/ directory
-## Conventions
-- Use seaborn for statistical plots
-- Report confidence intervals, not just p-values
-- Handle missing data explicitly (report, then impute)
-- Set random_state=42 for reproducibility
-```
-## Workflow
-```markdown
-### Interactive Analysis Loop
-1. "Load the experiment data from data/results.csv"
-   → Agent reads file, shows shape, dtypes, head()
-2. "How many missing values are there?"
-   → Agent runs df.isnull().sum(), reports per column
-3. "Show the distribution of response time by condition"
-   → Agent creates violin plots, reports summary stats
-4. "Is there a significant difference between groups?"
-   → Agent runs appropriate test (t-test, ANOVA, etc.)
-5. "Build a regression model predicting response time"
-   → Agent fits model, reports coefficients, R², diagnostics
-6. "Create a summary report with all findings"
-   → Agent generates markdown report with embedded figures
-```
-## Common Analysis Patterns
-```python
-# Data profiling
-import pandas as pd
-df = pd.read_csv("data/experiment.csv")
-print(f"Shape: {df.shape}")
-print(f"\nDtypes:\n{df.dtypes}")
-print(f"\nMissing:\n{df.isnull().sum()}")
-print(f"\nDescribe:\n{df.describe()}")
-# Statistical comparison
-from scipy import stats
-group_a = df[df["condition"] == "A"]["score"]
-group_b = df[df["condition"] == "B"]["score"]
-t_stat, p_value = stats.ttest_ind(group_a, group_b)
-print(f"t={t_stat:.3f}, p={p_value:.4f}")
-# Visualization
-import seaborn as sns
-import matplotlib.pyplot as plt
-fig, ax = plt.subplots(figsize=(8, 5))
-sns.violinplot(data=df, x="condition", y="score", ax=ax)
-plt.savefig("output/comparison.png", dpi=300, bbox_inches="tight")
-```
-## Use Cases
-1. **Experiment analysis**: Interactive analysis of lab data
-2. **EDA**: Rapid exploration of unfamiliar datasets
-3. **Statistical testing**: Guided hypothesis testing
-4. **Report generation**: Analysis reports with figures
-5. **Learning**: Interactive data science exploration
-## References
-- [claude-data-analysis GitHub](https://github.com/liangdabiao/claude-data-analysis)
-- [Claude Code](https://docs.anthropic.com/en/docs/claude-code)

package/skills/analysis/wrangling/open-data-scientist-guide/SKILL.md DELETED Viewed

@@ -1,197 +0,0 @@
----
-name: open-data-scientist-guide
-description: "AI agent that performs end-to-end data science workflows"
-metadata:
-  openclaw:
-    emoji: "📊"
-    category: "analysis"
-    subcategory: "wrangling"
-    keywords: ["data science", "automated analysis", "EDA", "feature engineering", "data wrangling", "AI agent"]
-    source: "https://github.com/Open-Data-Scientist/open-data-scientist"
----
-# Open Data Scientist Guide
-## Overview
-Open Data Scientist is an AI agent that automates end-to-end data science workflows — from data loading and cleaning through exploratory analysis, feature engineering, modeling, and report generation. It interprets natural language task descriptions, generates and executes Python code, iteratively refines analyses based on results, and produces publication-ready outputs. Designed for researchers who need quick, thorough data analyses without deep programming expertise.
-## Workflow Pipeline
-```
-Dataset + Task Description
-         ↓
-   Data Profiling (types, distributions, missing values)
-         ↓
-   Cleaning & Preprocessing (imputation, encoding, scaling)
-         ↓
-   Exploratory Data Analysis (correlations, distributions, outliers)
-         ↓
-   Feature Engineering (transforms, interactions, selection)
-         ↓
-   Modeling (train, evaluate, compare)
-         ↓
-   Report Generation (figures, tables, interpretation)
-```
-## Usage
-```python
-from open_data_scientist import DataScientist
-ds = DataScientist(llm_provider="anthropic")
-# Natural language task
-result = ds.analyze(
-    data="experiment_results.csv",
-    task="Identify which experimental conditions significantly affect "
-         "the response variable. Build a predictive model and report "
-         "the most important features.",
-)
-# Outputs
-print(result.summary)           # Text summary of findings
-result.save_report("report.html")  # Full HTML report
-result.save_figures("figures/")     # All generated plots
-```
-## Data Profiling
-```python
-# Automatic data profiling before analysis
-profile = ds.profile("dataset.csv")
-print(f"Rows: {profile.n_rows}, Columns: {profile.n_cols}")
-print(f"Missing values: {profile.missing_summary}")
-print(f"Data types: {profile.dtype_summary}")
-print(f"Potential issues: {profile.warnings}")
-# Column-level details
-for col in profile.columns:
-    print(f"\n{col.name} ({col.dtype}):")
-    print(f"  Unique: {col.n_unique}")
-    print(f"  Missing: {col.n_missing} ({col.pct_missing:.1f}%)")
-    if col.is_numeric:
-        print(f"  Range: [{col.min}, {col.max}]")
-        print(f"  Mean: {col.mean:.3f}, Std: {col.std:.3f}")
-```
-## Exploratory Data Analysis
-```python
-# Guided EDA
-eda_result = ds.explore(
-    data="dataset.csv",
-    focus="relationships",  # or "distributions", "outliers", "time_trends"
-    target_column="outcome",
-)
-# Generated analyses include:
-# - Correlation heatmap
-# - Pairwise scatter plots for top correlations
-# - Distribution plots per group
-# - Statistical tests (t-test, ANOVA, chi-square)
-# - Outlier detection (IQR, Z-score)
-for finding in eda_result.findings:
-    print(f"- {finding.description} (p={finding.p_value:.4f})")
-```
-## Feature Engineering
-```python
-# Automatic feature engineering
-features = ds.engineer_features(
-    data="dataset.csv",
-    target="outcome",
-    strategies=[
-        "polynomial_interactions",  # x1*x2, x1^2
-        "datetime_extraction",      # year, month, day_of_week
-        "text_embeddings",          # TF-IDF or sentence embeddings
-        "binning",                  # numeric to categorical
-        "target_encoding",          # category to target mean
-    ],
-    selection_method="mutual_information",
-    max_features=50,
-)
-print(f"Original features: {features.n_original}")
-print(f"Generated features: {features.n_generated}")
-print(f"Selected features: {features.n_selected}")
-```
-## Modeling Pipeline
-```python
-result = ds.model(
-    data="dataset.csv",
-    target="outcome",
-    task_type="classification",  # or "regression"
-    models=["logistic_regression", "random_forest",
-            "gradient_boosting", "neural_network"],
-    cv_folds=5,
-    metric="f1_macro",
-)
-# Model comparison table
-print(result.comparison_table)
-# | Model              | F1 Macro | Accuracy | AUC   |
-# |--------------------|----------|----------|-------|
-# | Gradient Boosting  | 0.847    | 0.862    | 0.921 |
-# | Random Forest      | 0.831    | 0.849    | 0.908 |
-# | ...                |          |          |       |
-# Best model details
-best = result.best_model
-print(f"Best: {best.name}")
-print(f"Feature importance:\n{best.feature_importance.head(10)}")
-```
-## Report Generation
-```python
-# Generate publication-ready report
-result = ds.analyze(
-    data="experiment_results.csv",
-    task="Full analysis with statistical tests",
-    report_config={
-        "format": "html",       # html, pdf, markdown
-        "style": "academic",    # academic, business, minimal
-        "include_code": True,   # Show generated code
-        "figure_dpi": 300,      # Publication quality
-    },
-)
-result.save_report("analysis_report.html")
-```
-## Configuration
-```python
-ds = DataScientist(
-    llm_provider="anthropic",
-    model="claude-sonnet-4-20250514",
-    execution_config={
-        "timeout": 300,             # Max seconds per code block
-        "max_iterations": 10,       # Refinement iterations
-        "sandbox": True,            # Isolated execution
-    },
-    analysis_config={
-        "significance_level": 0.05,
-        "random_state": 42,
-        "test_size": 0.2,
-    },
-)
-```
-## Use Cases
-1. **Experiment analysis**: Analyze lab or survey data with statistical tests
-2. **Dataset exploration**: Quick EDA on unfamiliar datasets
-3. **Baseline modeling**: Rapid prototyping of predictive models
-4. **Report generation**: Automated analysis reports for publications
-## References
-- [Open Data Scientist GitHub](https://github.com/Open-Data-Scientist/open-data-scientist)
-- [Pandas Profiling](https://github.com/ydataai/ydata-profiling)

package/skills/domains/ai-ml/annotated-dl-papers-guide/SKILL.md DELETED Viewed

@@ -1,159 +0,0 @@
----
-name: annotated-dl-papers-guide
-description: "Annotated deep learning paper implementations with side-by-side notes"
-metadata:
-  openclaw:
-    emoji: "📝"
-    category: "domains"
-    subcategory: "ai-ml"
-    keywords: ["deep-learning", "paper-implementation", "annotations", "transformer", "gan", "diffusion"]
-    source: "https://github.com/labmlai/annotated_deep_learning_paper_implementations"
----
-# Annotated Deep Learning Papers Guide
-## Overview
-The annotated_deep_learning_paper_implementations project, maintained by labml.ai with over 66,000 GitHub stars, provides 60+ implementations of influential deep learning papers with detailed, line-by-line annotations. Each implementation is presented as a literate programming document where the code and explanations are interwoven, making it possible to read the paper and understand the implementation simultaneously.
-This project bridges the gap between reading a research paper and understanding its practical implementation. For academic researchers, this is an essential resource because many breakthrough papers omit crucial implementation details, and reproducing results from a paper description alone can take weeks. The annotated implementations cover transformers, GANs, diffusion models, reinforcement learning algorithms, optimizers, and many other core deep learning building blocks.
-All implementations are written in PyTorch and are designed to be self-contained, readable, and runnable. The project also provides a web interface at papers.labml.ai where you can browse implementations with syntax-highlighted code alongside formatted annotations.
-## Installation and Setup
-Install the labml packages to use the implementations and experiment tracking:
-```bash
-# Install the core library
-pip install labml labml-nn
-# Clone for direct access to all implementations
-git clone https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
-cd annotated_deep_learning_paper_implementations
-# Install in development mode
-pip install -e .
-```
-Requirements:
-- Python 3.8+
-- PyTorch >= 1.9
-- labml >= 0.5 (experiment tracking and configuration)
-- numpy, einops for tensor operations
-The `labml` library provides experiment tracking, configuration management, and training loop utilities that are used throughout the implementations.
-## Core Paper Categories
-### Transformers and Attention
-The project includes comprehensive implementations of the transformer family:
-- **Original Transformer** (Vaswani et al., 2017): Multi-head attention, positional encoding, encoder-decoder architecture
-- **GPT and GPT-2**: Autoregressive language modeling with causal attention
-- **BERT**: Masked language modeling and next sentence prediction
-- **Vision Transformer (ViT)**: Applying transformers to image classification
-- **Flash Attention**: Memory-efficient attention computation
-- **Rotary Position Embeddings (RoPE)**: Position encoding used in modern LLMs
-- **Mixture of Experts (MoE)**: Sparse expert routing for scaling models
-```python
-# Example: Multi-head attention from the transformer implementation
-from labml_nn.transformers.mha import MultiHeadAttention
-# The implementation includes detailed annotations explaining
-# each step of the attention computation
-mha = MultiHeadAttention(
-    heads=8,
-    d_model=512,
-    dropout_prob=0.1
-)
-```
-### Generative Models
-- **GAN** (Goodfellow et al., 2014): Original generative adversarial network
-- **DCGAN**: Deep convolutional GAN with architectural guidelines
-- **StyleGAN**: Style-based generator architecture
-- **Diffusion Models (DDPM)**: Denoising diffusion probabilistic models
-- **Stable Diffusion**: Latent diffusion with CLIP conditioning
-- **VAE**: Variational autoencoders with KL divergence
-### Optimization and Training
-- **Adam, AdamW**: Adaptive learning rate optimizers
-- **LAMB**: Large batch optimization for distributed training
-- **Noam learning rate schedule**: Warmup + inverse square root decay
-- **Gradient clipping**: Norm-based and value-based clipping
-- **Mixed precision training**: FP16/BF16 training techniques
-### Normalization and Regularization
-- **Batch Normalization**: Per-batch statistics normalization
-- **Layer Normalization**: Per-sample normalization for transformers
-- **RMSNorm**: Simplified normalization used in LLaMA
-- **Dropout and DropPath**: Stochastic regularization methods
-## Using Implementations in Research
-Each implementation can be used as a building block in your own research projects. The modular design allows you to swap components easily:
-```python
-from labml_nn.transformers import TransformerLayer
-from labml_nn.transformers.mha import MultiHeadAttention
-from labml_nn.normalization.rmsnorm import RMSNorm
-# Build a custom transformer block with RMSNorm instead of LayerNorm
-class CustomTransformerBlock(nn.Module):
-    def __init__(self, d_model, heads, d_ff):
-        super().__init__()
-        self.attention = MultiHeadAttention(heads, d_model)
-        self.norm1 = RMSNorm(d_model)
-        self.norm2 = RMSNorm(d_model)
-        self.feed_forward = nn.Sequential(
-            nn.Linear(d_model, d_ff),
-            nn.GELU(),
-            nn.Linear(d_ff, d_model)
-        )
-    def forward(self, x):
-        x = x + self.attention(self.norm1(x), self.norm1(x), self.norm1(x), None)
-        x = x + self.feed_forward(self.norm2(x))
-        return x
-```
-The experiment tracking integration with labml makes it straightforward to log metrics, hyperparameters, and model checkpoints:
-```python
-from labml import experiment, tracker
-# Create an experiment
-experiment.create(name="custom_transformer_ablation")
-# Track metrics during training
-for epoch in range(num_epochs):
-    for batch in dataloader:
-        loss = train_step(batch)
-        tracker.save({"loss": loss, "epoch": epoch})
-```
-## Research Workflow Integration
-This project fits naturally into an academic deep learning research workflow:
-1. **Literature review**: Read the annotated implementation alongside the original paper to build deep understanding
-2. **Baseline reproduction**: Use the provided implementation as a verified baseline for comparison experiments
-3. **Architecture modification**: Fork a specific implementation and modify components for your research hypothesis
-4. **Ablation studies**: Systematically disable or replace components to measure their contribution
-5. **Paper writing**: Reference the annotated implementation for accurate method descriptions
-The web interface at papers.labml.ai provides a searchable index of all implementations, organized by topic. Each page shows the paper citation, a brief summary, and the annotated code with toggleable explanations.
-## References
-- Repository: https://github.com/labmlai/annotated_deep_learning_paper_implementations
-- Web interface: https://nn.labml.ai/
-- labml experiment tracking: https://github.com/labmlai/labml
-- PyTorch documentation: https://pytorch.org/docs/stable/