npm - @wentorai/research-plugins - Versions diffs - 1.0.0 - Mend

@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (252) hide show

package/skills/research/automation/data-collection-automation/SKILL.md ADDED Viewed

@@ -0,0 +1,248 @@
+---
+name: data-collection-automation
+description: "Automate survey deployment, data collection, and pipeline management"
+metadata:
+  openclaw:
+    emoji: "robot"
+    category: "research"
+    subcategory: "automation"
+    keywords: ["data collection", "survey automation", "pipeline", "Qualtrics API", "research automation", "ETL"]
+    source: "wentor-research-plugins"
+---
+# Data Collection Automation Guide
+A skill for automating research data collection, survey deployment, and data pipeline management. Covers survey platform APIs, automated data retrieval, quality checks, ETL pipelines, and scheduling for longitudinal studies.
+## Survey Platform APIs
+### Qualtrics API
+```python
+import os
+import json
+import urllib.request
+import time
+def export_qualtrics_responses(survey_id: str,
+                                file_format: str = "csv") -> str:
+    """
+    Export survey responses from Qualtrics via API.
+    Args:
+        survey_id: The Qualtrics survey ID (SV_...)
+        file_format: Export format (csv, json, spss)
+    """
+    api_token = os.environ["QUALTRICS_API_TOKEN"]
+    data_center = os.environ["QUALTRICS_DATACENTER"]
+    base_url = f"https://{data_center}.qualtrics.com/API/v3"
+    headers = {
+        "X-API-TOKEN": api_token,
+        "Content-Type": "application/json"
+    }
+    # Step 1: Start export
+    export_data = json.dumps({
+        "format": file_format,
+        "compress": False
+    }).encode("utf-8")
+    req = urllib.request.Request(
+        f"{base_url}/surveys/{survey_id}/export-responses",
+        data=export_data,
+        headers=headers
+    )
+    response = json.loads(urllib.request.urlopen(req).read())
+    progress_id = response["result"]["progressId"]
+    # Step 2: Poll for completion
+    status = "inProgress"
+    while status == "inProgress":
+        time.sleep(2)
+        req = urllib.request.Request(
+            f"{base_url}/surveys/{survey_id}/export-responses/{progress_id}",
+            headers=headers
+        )
+        check = json.loads(urllib.request.urlopen(req).read())
+        status = check["result"]["status"]
+    file_id = check["result"]["fileId"]
+    # Step 3: Download file
+    req = urllib.request.Request(
+        f"{base_url}/surveys/{survey_id}/export-responses/{file_id}/file",
+        headers=headers
+    )
+    file_data = urllib.request.urlopen(req).read()
+    output_path = f"responses_{survey_id}.{file_format}"
+    with open(output_path, "wb") as f:
+        f.write(file_data)
+    return output_path
+```
+### REDCap API
+```python
+def export_redcap_records(api_url: str, fields: list[str] = None) -> list:
+    """
+    Export records from a REDCap project.
+    Args:
+        api_url: REDCap API endpoint URL
+        fields: List of field names to export (None = all fields)
+    """
+    api_token = os.environ["REDCAP_API_TOKEN"]
+    data = {
+        "token": api_token,
+        "content": "record",
+        "format": "json",
+        "type": "flat"
+    }
+    if fields:
+        data["fields"] = ",".join(fields)
+    encoded = urllib.parse.urlencode(data).encode("utf-8")
+    req = urllib.request.Request(api_url, data=encoded)
+    response = urllib.request.urlopen(req)
+    return json.loads(response.read())
+```
+## Automated Data Quality Checks
+### Validation Pipeline
+```python
+import pandas as pd
+from datetime import datetime
+def validate_survey_data(df: pd.DataFrame,
+                          rules: dict) -> dict:
+    """
+    Run automated data quality checks on collected data.
+    Args:
+        df: DataFrame of survey responses
+        rules: Dict of column -> validation rule pairs
+    """
+    issues = []
+    # Check for duplicates
+    dupes = df.duplicated(subset=["respondent_id"]).sum()
+    if dupes > 0:
+        issues.append(f"Found {dupes} duplicate respondent IDs")
+    # Check completion rates
+    completion = df.notna().mean()
+    low_completion = completion[completion < 0.5]
+    for col in low_completion.index:
+        issues.append(f"Column '{col}' has {low_completion[col]:.0%} completion")
+    # Check value ranges
+    for col, rule in rules.items():
+        if col not in df.columns:
+            continue
+        if "min" in rule:
+            violations = (df[col] < rule["min"]).sum()
+            if violations > 0:
+                issues.append(f"{violations} values below minimum in '{col}'")
+        if "max" in rule:
+            violations = (df[col] > rule["max"]).sum()
+            if violations > 0:
+                issues.append(f"{violations} values above maximum in '{col}'")
+    # Check for speeding (unusually fast completion)
+    if "duration_seconds" in df.columns:
+        median_time = df["duration_seconds"].median()
+        speeders = (df["duration_seconds"] < median_time * 0.3).sum()
+        if speeders > 0:
+            issues.append(f"{speeders} respondents completed in <30% of median time")
+    return {
+        "n_records": len(df),
+        "n_issues": len(issues),
+        "issues": issues,
+        "timestamp": datetime.now().isoformat()
+    }
+```
+## ETL Pipeline for Research Data
+### Scheduled Data Retrieval
+```python
+def research_etl_pipeline(sources: list[dict],
+                           output_dir: str) -> dict:
+    """
+    Extract, transform, and load research data from multiple sources.
+    Args:
+        sources: List of data source configurations
+        output_dir: Directory to save processed data
+    """
+    results = {}
+    for source in sources:
+        name = source["name"]
+        # Extract
+        if source["type"] == "qualtrics":
+            raw_path = export_qualtrics_responses(source["survey_id"])
+            df = pd.read_csv(raw_path)
+        elif source["type"] == "redcap":
+            records = export_redcap_records(source["api_url"])
+            df = pd.DataFrame(records)
+        elif source["type"] == "csv_url":
+            df = pd.read_csv(source["url"])
+        else:
+            continue
+        # Transform
+        df = df.dropna(how="all")
+        df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]
+        # Load
+        timestamp = datetime.now().strftime("%Y%m%d")
+        output_path = f"{output_dir}/{name}_{timestamp}.csv"
+        df.to_csv(output_path, index=False)
+        results[name] = {
+            "records": len(df),
+            "columns": len(df.columns),
+            "output": output_path
+        }
+    return results
+```
+## Scheduling and Monitoring
+### Cron-Based Scheduling
+```bash
+# Run data collection pipeline daily at 6 AM
+# crontab -e
+0 6 * * * cd /path/to/project && python collect_data.py >> logs/collection.log 2>&1
+```
+### Monitoring Checklist
+```
+For longitudinal studies, automate monitoring of:
+  - Response rates per wave (alert if below threshold)
+  - Data quality metrics (completion, speeding, straight-lining)
+  - API quota usage (stay within rate limits)
+  - Storage usage and backup status
+  - Participant dropout patterns
+```
+## Ethical Considerations
+Always ensure automated data collection complies with your IRB/ethics board approval. Store API tokens securely using environment variables, never in code. Implement data encryption at rest. Log all data access for audit trails. Respect rate limits on external APIs. Include automated checks for consent status before processing participant data.

package/skills/research/automation/research-workflow-automation/SKILL.md ADDED Viewed

@@ -0,0 +1,266 @@
+---
+name: research-workflow-automation
+description: "Automate repetitive research tasks with pipelines, schedulers, and scripting"
+metadata:
+  openclaw:
+    emoji: "gear"
+    category: "research"
+    subcategory: "automation"
+    keywords: ["workflow management", "pipeline scheduler", "research automation", "scientific workflow", "task automation"]
+    source: "wentor"
+---
+# Research Workflow Automation
+A skill for automating repetitive research tasks using workflow managers, pipeline tools, and scripting. Covers data pipeline design, experiment tracking, automated reporting, and reproducible research workflows.
+## Workflow Management Tools
+### Tool Comparison
+| Tool | Language | Best For | Complexity | License |
+|------|----------|----------|-----------|---------|
+| Snakemake | Python | Bioinformatics, data pipelines | Medium | MIT |
+| Nextflow | Groovy/DSL | Genomics, HPC | Medium | Apache 2.0 |
+| Prefect | Python | Data engineering, ML | Medium | Apache 2.0 |
+| Airflow | Python | Scheduled ETL pipelines | High | Apache 2.0 |
+| Make | Makefile | Simple file-based pipelines | Low | GPL |
+| DVC | YAML/CLI | ML experiment tracking | Low | Apache 2.0 |
+### Snakemake: Scientific Workflow Example
+```python
+# Snakefile for a research data pipeline
+# Configuration
+configfile: "config.yaml"
+# Define the final outputs
+rule all:
+    input:
+        "results/figures/main_figure.pdf",
+        "results/tables/summary_table.csv",
+        "results/manuscript_stats.json"
+# Step 1: Download and preprocess data
+rule download_data:
+    output:
+        "data/raw/{dataset}.csv"
+    params:
+        url = lambda wildcards: config["datasets"][wildcards.dataset]["url"]
+    shell:
+        "curl -L {params.url} -o {output}"
+rule clean_data:
+    input:
+        "data/raw/{dataset}.csv"
+    output:
+        "data/cleaned/{dataset}.parquet"
+    script:
+        "scripts/clean_data.py"
+# Step 2: Run analysis
+rule statistical_analysis:
+    input:
+        expand("data/cleaned/{dataset}.parquet",
+               dataset=config["datasets"].keys())
+    output:
+        "results/analysis/statistics.json",
+        "results/analysis/model_fits.pkl"
+    threads: 4
+    resources:
+        mem_mb = 8000
+    script:
+        "scripts/run_analysis.py"
+# Step 3: Generate figures
+rule create_figures:
+    input:
+        "results/analysis/statistics.json"
+    output:
+        "results/figures/main_figure.pdf"
+    script:
+        "scripts/create_figures.py"
+# Step 4: Generate summary table
+rule summary_table:
+    input:
+        "results/analysis/statistics.json"
+    output:
+        "results/tables/summary_table.csv"
+    script:
+        "scripts/create_tables.py"
+```
+```bash
+# Execute the full pipeline
+snakemake --cores 8 --use-conda
+# Visualize the workflow DAG
+snakemake --dag | dot -Tpdf > workflow.pdf
+# Dry run to see what would be executed
+snakemake -n
+```
+## Make-Based Pipelines
+### Simple Makefile for Research
+```makefile
+# Makefile for a research project
+.PHONY: all clean data analysis figures paper
+# Default target
+all: paper
+# Data acquisition and cleaning
+data/cleaned/dataset.parquet: data/raw/dataset.csv scripts/clean.py
+	python scripts/clean.py --input $< --output $@
+# Analysis
+results/statistics.json: data/cleaned/dataset.parquet scripts/analyze.py
+	python scripts/analyze.py --input $< --output $@
+# Figures
+results/figures/%.pdf: results/statistics.json scripts/plot_%.py
+	python scripts/plot_$*.py --input $< --output $@
+# Compile paper
+paper: results/figures/main.pdf results/figures/supplement.pdf
+	cd paper && latexmk -pdf main.tex
+# Clean all generated files
+clean:
+	rm -rf data/cleaned/ results/ paper/*.pdf paper/*.aux paper/*.log
+```
+## Experiment Tracking
+### MLflow for Research Experiments
+```python
+import mlflow
+import json
+def track_experiment(experiment_name: str, params: dict,
+                      metrics: dict, artifacts: list[str] = None):
+    """
+    Track a research experiment with MLflow.
+    Args:
+        experiment_name: Name of the experiment series
+        params: Hyperparameters or configuration
+        metrics: Results metrics
+        artifacts: Paths to output files to log
+    """
+    mlflow.set_experiment(experiment_name)
+    with mlflow.start_run():
+        # Log parameters
+        for key, value in params.items():
+            mlflow.log_param(key, value)
+        # Log metrics
+        for key, value in metrics.items():
+            mlflow.log_metric(key, value)
+        # Log artifacts (figures, data files, etc.)
+        if artifacts:
+            for artifact_path in artifacts:
+                mlflow.log_artifact(artifact_path)
+        # Log the full configuration as JSON
+        mlflow.log_dict(params, "config.json")
+        run_id = mlflow.active_run().info.run_id
+        print(f"Experiment logged: {run_id}")
+        return run_id
+# Example: track a statistical analysis
+track_experiment(
+    experiment_name="treatment_effect_study",
+    params={
+        'model': 'linear_regression',
+        'covariates': 'age,sex,baseline_score',
+        'alpha': 0.05,
+        'data_version': 'v2.3'
+    },
+    metrics={
+        'r_squared': 0.42,
+        'treatment_effect': 0.35,
+        'p_value': 0.003,
+        'n_subjects': 245
+    },
+    artifacts=['results/figures/main.pdf']
+)
+```
+## Automated Reporting
+### Generate Reports from Analysis Results
+```python
+from jinja2 import Template
+from datetime import datetime
+def generate_report(results: dict, template_path: str,
+                     output_path: str):
+    """
+    Auto-generate a research report from analysis results.
+    """
+    report_template = Template("""
+# Analysis Report
+Generated: {{ timestamp }}
+## Summary Statistics
+- Sample size: {{ results.n }}
+- Mean outcome: {{ "%.2f"|format(results.mean) }}
+- Standard deviation: {{ "%.2f"|format(results.std) }}
+## Main Results
+- Treatment effect: {{ "%.3f"|format(results.effect) }}
+  (95% CI: {{ "%.3f"|format(results.ci_lower) }} to {{ "%.3f"|format(results.ci_upper) }})
+- p-value: {{ "%.4f"|format(results.p_value) }}
+- Effect size (Cohen's d): {{ "%.2f"|format(results.cohens_d) }}
+## Interpretation
+{% if results.p_value < 0.05 %}
+The treatment effect is statistically significant at the 5% level.
+{% else %}
+The treatment effect is not statistically significant at the 5% level.
+{% endif %}
+""")
+    report = report_template.render(
+        results=results,
+        timestamp=datetime.now().strftime('%Y-%m-%d %H:%M')
+    )
+    with open(output_path, 'w') as f:
+        f.write(report)
+    return output_path
+```
+## Scheduling and Cron Jobs
+### Automated Data Collection
+```bash
+# Crontab entry: run daily at 6 AM
+0 6 * * * cd /home/researcher/project && python scripts/daily_data_fetch.py >> logs/fetch.log 2>&1
+# Weekly analysis update (every Monday at 9 AM)
+0 9 * * 1 cd /home/researcher/project && snakemake --cores 4 >> logs/pipeline.log 2>&1
+```
+## Best Practices
+1. **Version everything**: Code, data, configurations, and environments
+2. **Idempotent pipelines**: Running the same pipeline twice produces the same output
+3. **Fail fast**: Validate inputs early; do not process bad data silently
+4. **Log everything**: Record timestamps, parameters, and random seeds
+5. **Separate configuration from code**: Use YAML/JSON config files, not hardcoded values
+6. **Test with small data first**: Use a 1% sample to verify the pipeline before full runs
+7. **Document the workflow**: A README explaining how to run the full pipeline from scratch

package/skills/research/deep-research/meta-synthesis-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,174 @@
+---
+name: meta-synthesis-guide
+description: "Conduct qualitative meta-synthesis and evidence synthesis methods"
+metadata:
+  openclaw:
+    emoji: "chains"
+    category: "research"
+    subcategory: "deep-research"
+    keywords: ["meta-synthesis", "qualitative evidence synthesis", "meta-ethnography", "thematic synthesis", "systematic review"]
+    source: "wentor-research-plugins"
+---
+# Meta-Synthesis Guide
+A skill for conducting qualitative meta-synthesis -- the systematic integration of findings across multiple qualitative studies. Covers meta-ethnography (Noblit & Hare), thematic synthesis (Thomas & Harden), framework synthesis, and quality appraisal of qualitative studies.
+## What Is Qualitative Meta-Synthesis?
+### Overview
+```
+Meta-synthesis is to qualitative research what meta-analysis
+is to quantitative research -- it systematically combines
+findings from multiple studies to produce higher-order
+interpretations.
+Key differences from meta-analysis:
+  - Interpretive, not statistical aggregation
+  - Aims to generate new understanding, not average effect sizes
+  - Synthesizes themes, concepts, and metaphors across studies
+  - Product is a new interpretation, not a pooled statistic
+```
+### When to Use Meta-Synthesis
+```
+Appropriate when:
+  - Multiple qualitative studies exist on a topic
+  - You want to build theory or deepen understanding
+  - Individual studies have limited scope but collectively cover a phenomenon
+  - Policy or practice needs an integrated evidence base from qualitative work
+Not appropriate when:
+  - Studies are too heterogeneous in topic to meaningfully compare
+  - Fewer than 3 qualitative studies exist
+  - The goal is to measure effect sizes (use meta-analysis instead)
+```
+## Meta-Ethnography (Noblit & Hare)
+### Seven-Step Process
+```python
+def meta_ethnography_steps() -> dict:
+    """
+    The seven steps of meta-ethnography (Noblit & Hare, 1988).
+    """
+    return {
+        "step_1_getting_started": {
+            "description": "Identify the research question and intellectual interest",
+            "output": "Clear synthesis question"
+        },
+        "step_2_deciding_what_is_relevant": {
+            "description": "Systematic search and selection of qualitative studies",
+            "output": "Final set of included studies",
+            "note": "Use PRISMA flow diagram to document selection"
+        },
+        "step_3_reading_the_studies": {
+            "description": (
+                "Read and re-read included studies carefully. "
+                "Identify key metaphors, themes, and concepts in each."
+            ),
+            "output": "List of first-order (participant quotes) and "
+                      "second-order (author interpretations) constructs"
+        },
+        "step_4_determining_how_studies_are_related": {
+            "description": (
+                "Create a grid mapping constructs across studies. "
+                "Determine if studies are reciprocal (about similar things), "
+                "refutational (contradictory), or form a line of argument."
+            ),
+            "output": "Construct comparison table"
+        },
+        "step_5_translating_studies": {
+            "description": (
+                "Translate the concepts of one study into the terms of another. "
+                "This is the core analytical step -- finding common meaning "
+                "expressed in different language."
+            ),
+            "output": "Translated constructs across all studies"
+        },
+        "step_6_synthesizing_translations": {
+            "description": (
+                "Develop third-order constructs -- new interpretations "
+                "that go beyond what any single study found."
+            ),
+            "output": "Third-order constructs (the synthesis)"
+        },
+        "step_7_expressing_the_synthesis": {
+            "description": "Write up the synthesis in a form accessible to the audience",
+            "output": "Published meta-synthesis paper"
+        }
+    }
+```
+### Types of Synthesis
+```
+Reciprocal translation:
+  Studies are about similar things. Translate them into each other.
+  "Study A calls it 'navigating uncertainty'; Study B calls it
+   'managing ambiguity'; Study C calls it 'living with not knowing'.
+   The overarching construct is 'Tolerating the Unknown.'"
+Refutational synthesis:
+  Studies contradict each other. Explore why.
+  "Study A found empowerment through peer support; Study B found
+   peer support increased anxiety. This refutation may be explained
+   by the stage of illness at which support was received."
+Line of argument synthesis:
+  Studies address different aspects that together form a whole.
+  "Study A covers diagnosis, B covers treatment, C covers recovery.
+   Together they reveal a trajectory of 'Reconstructing Identity.'"
+```
+## Thematic Synthesis (Thomas & Harden)
+### Three-Stage Approach
+```
+Stage 1: Free coding of findings
+  - Treat the findings sections of included studies as data
+  - Code them line by line, as in primary qualitative analysis
+Stage 2: Organizing codes into descriptive themes
+  - Group codes into descriptive themes
+  - These are "close to" the original studies
+Stage 3: Generating analytical themes
+  - Go beyond the content of the original studies
+  - Generate new interpretive constructs
+  - Answer the synthesis research question
+```
+## Quality Appraisal
+### Assessing Qualitative Studies
+```
+Tools for appraising qualitative study quality:
+  CASP Qualitative Checklist (10 items):
+    - Was there a clear statement of aims?
+    - Is a qualitative methodology appropriate?
+    - Was the research design appropriate?
+    - Was the recruitment strategy appropriate?
+    - Was data collected in a way that addressed the research issue?
+    - Was the relationship between researcher and participants considered?
+    - Were ethical issues considered?
+    - Was data analysis sufficiently rigorous?
+    - Was there a clear statement of findings?
+    - How valuable is the research?
+  JBI Checklist for Qualitative Research (10 criteria)
+Decision: Include all studies or exclude low-quality studies?
+  - Sensitivity analysis: Run the synthesis with and without
+    lower-quality studies to see if conclusions change.
+```
+## Reporting Standards
+Use the ENTREQ (Enhancing Transparency in Reporting the Synthesis of Qualitative Research) statement. Report: the synthesis methodology used, the search strategy and selection criteria, quality appraisal results, a table of included studies with their key constructs, the synthesis process with clear evidence trails, and how third-order constructs were derived from the primary studies.