npm - @wentorai/research-plugins - Versions diffs - 1.0.0 - Mend

@wentorai/research-plugins 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (252) hide show

package/skills/research/methodology/experimental-design-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,236 @@
+---
+name: experimental-design-guide
+description: "Design rigorous experiments using DOE, factorial designs, and response surfaces"
+metadata:
+  openclaw:
+    emoji: "test_tube"
+    category: "research"
+    subcategory: "methodology"
+    keywords: ["design of experiments", "DOE", "factorial design", "response surface methodology", "experimental design"]
+    source: "wentor"
+---
+# Experimental Design Guide
+A skill for designing rigorous experiments using formal Design of Experiments (DOE) methodology. Covers factorial designs, fractional factorials, response surface methods, and optimal design strategies for scientific research.
+## Fundamental Principles
+### Fisher's Three Principles
+1. **Randomization**: Assign experimental units to treatments randomly to eliminate systematic bias
+2. **Replication**: Include enough replicates to estimate experimental error and ensure statistical power
+3. **Blocking**: Group similar experimental units to reduce nuisance variability
+### Sample Size and Power Analysis
+```python
+from scipy import stats
+import numpy as np
+def power_analysis_ttest(effect_size: float, alpha: float = 0.05,
+                          power: float = 0.80, ratio: float = 1.0) -> dict:
+    """
+    Calculate required sample size for a two-sample t-test.
+    Args:
+        effect_size: Cohen's d (expected effect size)
+        alpha: Significance level
+        power: Desired statistical power
+        ratio: Ratio of n2/n1 (for unequal groups)
+    """
+    from statsmodels.stats.power import TTestIndPower
+    analysis = TTestIndPower()
+    n1 = analysis.solve_power(
+        effect_size=effect_size,
+        alpha=alpha,
+        power=power,
+        ratio=ratio,
+        alternative='two-sided'
+    )
+    return {
+        'n_per_group': int(np.ceil(n1)),
+        'total_n': int(np.ceil(n1) + np.ceil(n1 * ratio)),
+        'effect_size_d': effect_size,
+        'alpha': alpha,
+        'power': power,
+        'interpretation': (
+            f"Need {int(np.ceil(n1))} per group "
+            f"(total N = {int(np.ceil(n1) + np.ceil(n1 * ratio))}) "
+            f"to detect d = {effect_size} with {power*100:.0f}% power."
+        )
+    }
+# Example: medium effect size
+result = power_analysis_ttest(effect_size=0.5, alpha=0.05, power=0.80)
+print(result['interpretation'])
+```
+## Full Factorial Designs
+### 2^k Factorial Design
+```python
+import itertools
+import pandas as pd
+def create_factorial_design(factors: dict, replicates: int = 3) -> pd.DataFrame:
+    """
+    Create a full factorial experimental design.
+    Args:
+        factors: Dict mapping factor names to lists of levels
+                 e.g., {'Temperature': [60, 80], 'Pressure': [1, 2], 'Catalyst': ['A', 'B']}
+        replicates: Number of replicates per combination
+    """
+    factor_names = list(factors.keys())
+    factor_levels = list(factors.values())
+    # Generate all combinations
+    combinations = list(itertools.product(*factor_levels))
+    # Create design matrix with replicates
+    rows = []
+    run_order = 0
+    for rep in range(replicates):
+        for combo in combinations:
+            run_order += 1
+            row = {'Run': run_order, 'Replicate': rep + 1}
+            for name, value in zip(factor_names, combo):
+                row[name] = value
+            row['Response'] = None  # To be filled with experimental data
+            rows.append(row)
+    design = pd.DataFrame(rows)
+    # Randomize run order
+    design = design.sample(frac=1, random_state=42).reset_index(drop=True)
+    design['RandomizedRun'] = range(1, len(design) + 1)
+    print(f"Design summary:")
+    print(f"  Factors: {len(factors)}")
+    print(f"  Levels per factor: {[len(v) for v in factors.values()]}")
+    print(f"  Total treatments: {len(combinations)}")
+    print(f"  Replicates: {replicates}")
+    print(f"  Total runs: {len(design)}")
+    return design
+# Example: 2^3 factorial
+design = create_factorial_design({
+    'Temperature': [60, 80],
+    'Pressure': [1, 2],
+    'Catalyst': ['A', 'B']
+}, replicates=3)
+```
+### Analyzing Factorial Experiments
+```python
+import statsmodels.api as sm
+from statsmodels.formula.api import ols
+def analyze_factorial(df: pd.DataFrame, response: str,
+                       factors: list[str]) -> dict:
+    """
+    Analyze a factorial experiment using ANOVA.
+    """
+    # Build formula with all main effects and interactions
+    main_effects = ' + '.join([f'C({f})' for f in factors])
+    interactions = ' + '.join([f'C({f1}):C({f2})'
+                               for i, f1 in enumerate(factors)
+                               for f2 in factors[i+1:]])
+    formula = f'{response} ~ {main_effects} + {interactions}'
+    model = ols(formula, data=df).fit()
+    anova_table = sm.stats.anova_lm(model, typ=2)
+    # Effect sizes (eta-squared)
+    ss_total = anova_table['sum_sq'].sum()
+    anova_table['eta_sq'] = anova_table['sum_sq'] / ss_total
+    return {
+        'anova_table': anova_table,
+        'r_squared': model.rsquared,
+        'significant_effects': anova_table[anova_table['PR(>F)'] < 0.05].index.tolist()
+    }
+```
+## Fractional Factorial Designs
+When a full factorial has too many runs:
+```python
+def fractional_factorial_2k(k: int, resolution: int = 3) -> pd.DataFrame:
+    """
+    Generate a 2^(k-p) fractional factorial design.
+    Args:
+        k: Number of factors
+        resolution: Design resolution (III, IV, or V)
+    """
+    from pyDOE2 import fracfact
+    # Resolution III: 2^(k-p) where p minimizes runs
+    # Common designs:
+    # 2^(3-1) = 4 runs (Resolution III)
+    # 2^(4-1) = 8 runs (Resolution IV)
+    # 2^(5-2) = 8 runs (Resolution III)
+    # 2^(7-4) = 8 runs (Resolution III, Plackett-Burman)
+    design = fracfact(f'a b c {"d" if k >= 4 else ""} {"e" if k >= 5 else ""}')
+    df = pd.DataFrame(design, columns=[f'Factor_{i+1}' for i in range(design.shape[1])])
+    print(f"Fractional factorial: {len(df)} runs for {k} factors")
+    return df
+```
+## Response Surface Methodology (RSM)
+### Central Composite Design
+```python
+def central_composite_design(factor_ranges: dict) -> pd.DataFrame:
+    """
+    Create a Central Composite Design for response surface optimization.
+    """
+    from pyDOE2 import ccdesign
+    k = len(factor_ranges)
+    design_coded = ccdesign(k, center=(4,), alpha='orthogonal', face='circumscribed')
+    factor_names = list(factor_ranges.keys())
+    df = pd.DataFrame(design_coded, columns=factor_names)
+    # Convert from coded (-1, +1) to natural units
+    for name, (low, high) in factor_ranges.items():
+        center = (high + low) / 2
+        half_range = (high - low) / 2
+        df[name] = center + df[name] * half_range
+    return df
+# Example: optimize a chemical reaction
+design = central_composite_design({
+    'Temperature_C': [50, 90],
+    'pH': [5, 9],
+    'Time_min': [10, 60]
+})
+```
+## Randomization and Blinding
+- **Single-blind**: Participants do not know their treatment assignment
+- **Double-blind**: Neither participants nor experimenters know assignments
+- **Allocation concealment**: Assignment sequence is hidden until the moment of assignment
+For computer-generated randomization, always record and report the random seed used. Use block randomization to ensure balanced groups when enrollment is sequential.
+## Reporting Checklist
+Follow CONSORT (clinical trials), ARRIVE (animal studies), or STROBE (observational) guidelines:
+- State the primary and secondary outcomes before analysis
+- Report all planned analyses, including non-significant results
+- Describe randomization method and any deviations from protocol
+- Include sample size justification with power analysis parameters

package/skills/research/methodology/grad-school-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,182 @@
+---
+name: grad-school-guide
+description: "Practical advice for thriving in PhD programs and academic research"
+metadata:
+  openclaw:
+    emoji: "🎓"
+    category: "research"
+    subcategory: "methodology"
+    keywords: ["research question formulation", "hypothesis formulation", "theoretical framework", "conceptual model"]
+    source: "https://github.com/poloclub/awesome-grad-school"
+---
+# Graduate School Research Guide
+## Overview
+Graduate school -- particularly a PhD program -- is a multi-year commitment that demands not only technical skills but also effective research methodology, advisor management, paper writing strategies, and career planning. The difference between thriving and merely surviving often comes down to having the right mental models and practical frameworks for the research process.
+This guide distills wisdom from the awesome-grad-school repository (450+ stars, maintained by the Polo Club of Data Science at Georgia Tech) and supplements it with actionable frameworks for formulating research questions, developing hypotheses, structuring a theoretical framework, and managing the end-to-end research lifecycle. The advice here applies broadly across STEM and social-science disciplines.
+Whether you are an incoming PhD student, a mid-program researcher seeking to improve your productivity, or an advanced candidate preparing for the job market, this skill provides concrete tools for each stage of the journey.
+## Formulating Research Questions
+A strong research question is the foundation of any good paper. It should be specific, answerable, and significant.
+### The FINER Criteria
+| Criterion | Description | Example Check |
+|-----------|-------------|---------------|
+| **F**easible | Can be answered with available resources | Do you have the data, compute, and time? |
+| **I**nteresting | Engages the research community | Would peers read this at a top venue? |
+| **N**ovel | Not already answered | Has Semantic Scholar search been done? |
+| **E**thical | Follows research ethics standards | Does it require IRB approval? |
+| **R**elevant | Advances the field meaningfully | Does it connect to open problems? |
+### From Topic to Question: A Step-by-Step Process
+1. **Survey the landscape.** Read 20-30 recent papers in your area.
+2. **Identify gaps.** Look for "future work" sections and limitations.
+3. **Narrow progressively.** Topic -> Sub-topic -> Specific question.
+4. **Phrase as a question.** "Does X improve Y compared to Z in context W?"
+5. **Test with the "so what?" check.** If the answer is yes or no, does it matter?
+Example progression:
+```
+Topic:    Natural language processing
+Sub-topic: Low-resource language translation
+Gap:      Few-shot methods underperform on morphologically rich languages
+Question: Can morphological decomposition improve few-shot translation
+          quality for agglutinative languages?
+```
+## Developing Hypotheses and Theoretical Frameworks
+### From Question to Hypothesis
+A hypothesis is a testable, falsifiable prediction derived from your research question:
+- **Directional:** "Method A will achieve higher BLEU scores than Method B on agglutinative language pairs."
+- **Non-directional:** "There will be a significant difference in BLEU scores between Method A and Method B."
+- **Null (H0):** "There is no significant difference in BLEU scores between Method A and Method B."
+### Building a Conceptual Model
+A conceptual model maps the relationships between your key variables:
+```
+Independent Variable      Moderator        Dependent Variable
+[Morphological           [Language         [Translation
+ Decomposition]  ------> Typology]  -----> Quality (BLEU)]
+        |                                        ^
+        |          Mediator                      |
+        +-------> [Vocabulary                    |
+                   Coverage] --------------------+
+```
+Document your conceptual model with:
+1. **Constructs:** The abstract concepts (e.g., "translation quality").
+2. **Operationalizations:** How you measure each construct (e.g., BLEU, COMET scores).
+3. **Relationships:** Hypothesized causal or correlational links.
+4. **Boundary conditions:** Where the model applies and where it does not.
+## Managing Your Advisor and Research Workflow
+### Communication Frameworks
+**The Weekly Update Email:**
+```
+Subject: Weekly Update - [Your Name] - Week of [Date]
+1. ACCOMPLISHED THIS WEEK
+   - Completed experiment X with results Y
+   - Drafted Section 3 of the paper
+2. BLOCKERS
+   - Need access to GPU cluster for large-scale runs
+   - Waiting on co-author feedback on Section 2
+3. PLAN FOR NEXT WEEK
+   - Run ablation study on components A, B, C
+   - Begin writing Section 4
+4. DISCUSSION ITEMS FOR MEETING
+   - Should we include dataset Z in our evaluation?
+   - Timeline for submission to [Conference]
+```
+### Research Productivity System
+| Practice | Cadence | Tool |
+|----------|---------|------|
+| Daily progress log | End of each day | Plain text file or Notion |
+| Literature reading | 2-3 papers/week | Zotero + annotations |
+| Experiment tracking | Per run | Weights & Biases or MLflow |
+| Writing | 30 min daily minimum | LaTeX or Markdown |
+| Advisor meeting prep | Weekly | Structured update email |
+| Research talks | Monthly (lab meeting) | 15-min presentation |
+## Paper Writing Strategy
+### The Reverse-Outline Method
+1. Write bullet points for each section (1-2 sentences per paragraph).
+2. Order bullets by logical flow.
+3. Expand each bullet into a full paragraph.
+4. Revise for transitions and coherence.
+### Section-by-Section Tips
+- **Introduction:** Open with a concrete problem, not "In recent years..."
+- **Related Work:** Organize by theme, not chronologically. Compare approaches, do not just list them.
+- **Methods:** Write so a competent researcher can reproduce your work.
+- **Results:** Lead with the most important finding. Use tables for exact numbers, figures for trends.
+- **Discussion:** Address limitations honestly. Reviewers respect self-awareness.
+### Handling Rejection
+Paper rejection is a normal part of academic life. The awesome-grad-school community recommends:
+1. **Allow 24-48 hours to process emotions.** Do not respond immediately.
+2. **Categorize each review comment** as: (a) valid and fixable, (b) valid but requires new experiments, (c) misunderstanding to clarify, or (d) subjective disagreement.
+3. **Create an action plan** for addressing category (a) and (b) items.
+4. **Resubmit to the next venue** with improvements, not just the same paper.
+## Career Planning
+### Timeline for a 5-Year PhD
+| Year | Focus | Milestones |
+|------|-------|-----------|
+| 1 | Coursework + exploration | Pass qualifying exam, identify area |
+| 2 | First project + first paper | Submit to workshop or conference |
+| 3 | Core research + publications | 1-2 papers at top venues |
+| 4 | Thesis writing + job market prep | Draft thesis proposal, internship |
+| 5 | Defense + job search | Submit thesis, interview |
+### Building Visibility
+- Maintain a personal academic website with publications and blog posts.
+- Present at conferences and workshops.
+- Share preprints on arXiv before publication.
+- Engage constructively on academic social media.
+## Best Practices
+- **Start writing early.** The paper is not separate from the research -- writing clarifies thinking.
+- **Build a library of reusable code.** Experiment templates, plotting scripts, and data loaders save hours on each project.
+- **Invest in relationships.** Collaborators, mentors, and peers are your most valuable resource.
+- **Take care of your health.** PhD burnout is real. Set boundaries and maintain activities outside research.
+- **Read broadly.** Some of the best ideas come from adjacent fields.
+- **Track your accomplishments.** Maintain a running CV and a "brag document" for annual reviews and job applications.
+## References
+- [awesome-grad-school](https://github.com/poloclub/awesome-grad-school) -- Curated grad school advice (450+ stars)
+- [A Survival Guide to a PhD](http://karpathy.github.io/2016/09/07/phd/) -- Andrej Karpathy
+- [Modest Advice for New Graduate Students](https://stearnslab.yale.edu/modest-advice) -- Stephen Stearns, Yale
+- [De-Mystifying Good Research](https://bigaidream.gitbooks.io/tech-blog/content/2014/de-mystifying-good-research.html) -- Fei-Fei Li
+- [How to Write a Great Research Paper](https://www.microsoft.com/en-us/research/academic-program/write-great-research-paper/) -- Simon Peyton Jones

package/skills/research/methodology/grounded-theory-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,171 @@
+---
+name: grounded-theory-guide
+description: "Apply grounded theory methodology to develop theory from data"
+metadata:
+  openclaw:
+    emoji: "seedling"
+    category: "research"
+    subcategory: "methodology"
+    keywords: ["grounded theory", "qualitative methodology", "theoretical sampling", "constant comparison", "coding"]
+    source: "wentor-research-plugins"
+---
+# Grounded Theory Guide
+A skill for applying grounded theory methodology (GTM) to generate theory grounded in empirical data. Covers the three major schools (Glaser, Strauss/Corbin, Charmaz), coding procedures, theoretical sampling, memo writing, and criteria for evaluating grounded theories.
+## Three Schools of Grounded Theory
+### Comparing Approaches
+| Aspect | Classic (Glaser) | Straussian (Strauss & Corbin) | Constructivist (Charmaz) |
+|--------|-----------------|------------------------------|-------------------------|
+| Ontology | Objective reality | Pragmatist | Relativist/constructivist |
+| Literature review | Delay until theory emerges | Early but non-constraining | Early, reflexive engagement |
+| Coding paradigm | Open, selective, theoretical | Open, axial, selective | Initial, focused, theoretical |
+| Verification | Emergent fit | Systematic validation | Co-construction with participants |
+| Core output | Substantive theory | Process model | Interpretive theory |
+| Key text | Glaser (1978) | Strauss & Corbin (1998) | Charmaz (2014) |
+### Choosing an Approach
+```
+Use Classic GTM when:
+  - You want the theory to emerge with minimal preconception
+  - You are studying a process in a substantive area
+Use Straussian GTM when:
+  - You need a structured, systematic coding procedure
+  - Your discipline values replicable analytical steps
+Use Constructivist GTM when:
+  - You acknowledge the researcher's role in co-creating meaning
+  - You study experiences, identities, or social processes
+  - You work in health, education, or social science
+```
+## The Coding Process
+### Three-Stage Coding
+```python
+def grounded_theory_coding_stages() -> dict:
+    """
+    Describe the three stages of grounded theory coding.
+    """
+    return {
+        "stage_1_initial_coding": {
+            "also_called": "Open coding",
+            "description": (
+                "Examine data line by line or incident by incident. "
+                "Generate codes that stay close to the data. "
+                "Use gerunds (action words ending in -ing) to capture processes."
+            ),
+            "example": {
+                "data": "I started looking for help online because the doctor "
+                        "did not explain anything to me.",
+                "codes": [
+                    "Seeking information online",
+                    "Experiencing communication gap with provider",
+                    "Taking initiative in own care"
+                ]
+            },
+            "tips": [
+                "Code quickly -- do not overthink individual codes",
+                "Stay open; do not force data into preexisting categories",
+                "Code actions and processes, not topics",
+                "Write memos about ideas that arise during coding"
+            ]
+        },
+        "stage_2_focused_coding": {
+            "also_called": "Axial coding (Strauss) or Focused coding (Charmaz)",
+            "description": (
+                "Select the most frequent and significant initial codes. "
+                "Use them to sort and synthesize larger amounts of data. "
+                "Identify relationships between categories."
+            ),
+            "tasks": [
+                "Elevate initial codes to categories",
+                "Identify properties and dimensions of each category",
+                "Compare categories across cases",
+                "Begin developing a conceptual framework"
+            ]
+        },
+        "stage_3_theoretical_coding": {
+            "also_called": "Selective coding",
+            "description": (
+                "Identify the core category that integrates all other "
+                "categories into a coherent theoretical framework. "
+                "Specify relationships between categories."
+            ),
+            "output": "A substantive theory explaining the phenomenon"
+        }
+    }
+```
+## Theoretical Sampling
+### Sampling Driven by Emerging Theory
+```
+Traditional sampling: Decide sample before data collection
+Theoretical sampling: Let the emerging theory guide who/what to sample next
+Process:
+  1. Collect initial data (purposive sampling)
+  2. Analyze data, identify emerging categories
+  3. Ask: "Where should I look next to develop these categories?"
+  4. Sample deliberately to fill gaps in the emerging theory
+  5. Continue until theoretical saturation
+Example:
+  Initial interviews: Patients with chronic illness
+  Emerging category: "Navigating insurance barriers"
+  Next sample: Interview insurance navigators and social workers
+  Emerging category: "Stigma in seeking help"
+  Next sample: Interview patients who avoided seeking help
+```
+## Memo Writing
+### The Engine of Grounded Theory
+Memos are the researcher's running commentary on codes, categories, and theoretical ideas. They are the primary mechanism for developing theory.
+```
+Memo types:
+  - Code memos: Define and elaborate a code or category
+  - Theoretical memos: Explore relationships between categories
+  - Operational memos: Record methodological decisions
+  - Reflexive memos: Examine researcher influence on the analysis
+Memo example:
+  MEMO: "Becoming an expert patient" (2026-03-05)
+  Several participants describe a transition from passive
+  recipient of care to active manager of their condition.
+  This process seems to involve three phases: (1) initial
+  confusion and dependence, (2) information seeking and
+  experimentation, (3) confident self-management. The trigger
+  appears to be a critical incident (a misdiagnosis, a bad
+  interaction with a provider) that motivates the person to
+  take control. Compare with Corbin & Strauss's trajectory
+  framework. Need to sample someone early in the trajectory
+  to test whether the trigger is consistent.
+```
+## Evaluating Grounded Theory
+### Quality Criteria
+| Criterion | Description | How to Demonstrate |
+|-----------|------------|-------------------|
+| Fit | Theory fits the data it was derived from | Show clear evidence trail from data to codes to categories |
+| Relevance | Theory addresses a real concern of participants | Member checking, resonance with practitioners |
+| Workability | Theory explains the process and enables prediction | Apply the theory to new cases |
+| Modifiability | Theory can be updated with new data | Show how the theory evolved during the study |
+| Credibility | Analysis is thorough and systematic | Audit trail, reflexive memos, theoretical saturation |
+## Reporting a Grounded Theory Study
+Include: a clear description of the coding process and how categories were derived, a diagram or model of the theory, representative quotes for each major category, an explanation of theoretical sampling decisions, and a discussion of how the theory relates to existing literature. Use the SRQR (Standards for Reporting Qualitative Research) checklist to ensure completeness.