npm - @wentorai/research-plugins - Versions diffs - 1.0.0 → 1.1.0 - Mend

@wentorai/research-plugins 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (203) hide show

package/skills/domains/cs/formal-verification-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,298 @@
+---
+name: formal-verification-guide
+description: "Formal methods, theorem proving, and model checking for CS research"
+metadata:
+  openclaw:
+    emoji: "check-mark"
+    category: "domains"
+    subcategory: "cs"
+    keywords: ["formal-verification", "theorem-proving", "model-checking", "tla-plus", "coq", "isabelle"]
+    source: "wentor"
+---
+# Formal Verification Guide
+A skill for applying formal methods to verify software and hardware correctness. Covers model checking, interactive theorem proving, specification languages, and practical verification workflows used in systems and programming language research.
+## Verification Approaches Overview
+### Methods Comparison
+| Approach | Technique | Strengths | Limitations |
+|----------|-----------|-----------|-------------|
+| Model checking | Exhaustive state exploration | Fully automatic, produces counterexamples | State space explosion |
+| Theorem proving | Interactive proof construction | Handles infinite state | Requires expert effort |
+| Abstract interpretation | Sound static analysis | Automatic, scales well | May report false positives |
+| SMT solving | Constraint satisfiability | Powerful automation | Limited to decidable theories |
+| Runtime verification | Execution monitoring | Low barrier, practical | Only checks observed runs |
+## TLA+ Specification
+### Specifying Distributed Protocols
+TLA+ is the standard specification language for distributed systems:
+```tla
+--------------------------- MODULE TwoPhaseCommit -------------------------
+EXTENDS Integers, Sequences, FiniteSets
+CONSTANTS RM  \* Set of resource managers
+VARIABLES
+    rmState,      \* rmState[r] is the state of resource manager r
+    tmState,      \* State of the transaction manager
+    tmPrepared,   \* Set of RMs that have sent "Prepared"
+    msgs          \* Set of messages sent
+vars == <<rmState, tmState, tmPrepared, msgs>>
+Init ==
+    /\ rmState = [r \in RM |-> "working"]
+    /\ tmState = "init"
+    /\ tmPrepared = {}
+    /\ msgs = {}
+\* RM r prepares to commit
+RMPrepare(r) ==
+    /\ rmState[r] = "working"
+    /\ rmState' = [rmState EXCEPT ![r] = "prepared"]
+    /\ msgs' = msgs \union {[type |-> "Prepared", rm |-> r]}
+    /\ UNCHANGED <<tmState, tmPrepared>>
+\* TM receives a Prepared message from RM r
+TMRcvPrepared(r) ==
+    /\ tmState = "init"
+    /\ [type |-> "Prepared", rm |-> r] \in msgs
+    /\ tmPrepared' = tmPrepared \union {r}
+    /\ UNCHANGED <<rmState, tmState, msgs>>
+\* TM commits (all RMs have prepared)
+TMCommit ==
+    /\ tmState = "init"
+    /\ tmPrepared = RM
+    /\ tmState' = "committed"
+    /\ msgs' = msgs \union {[type |-> "Commit"]}
+    /\ UNCHANGED <<rmState, tmPrepared>>
+\* Safety property: No RM commits unless TM has committed
+Consistency ==
+    \A r \in RM : rmState[r] = "committed" => tmState = "committed"
+========================================================================
+```
+### Running the TLC Model Checker
+```bash
+# Install TLA+ Toolbox or use command-line TLC
+# Define model with specific constants
+# RM = {"rm1", "rm2", "rm3"}
+java -jar tla2tools.jar -config TwoPhaseCommit.cfg TwoPhaseCommit.tla
+# TLC will explore all reachable states and verify:
+# - No deadlocks (unless specified)
+# - Safety properties (invariants)
+# - Liveness properties (temporal formulas)
+```
+## Interactive Theorem Proving
+### Coq Proof Assistant
+```coq
+(* Example: Proving properties of a simple functional program *)
+(* Define natural number addition *)
+Fixpoint add (n m : nat) : nat :=
+  match n with
+  | O => m
+  | S n' => S (add n' m)
+  end.
+(* Prove: 0 + n = n (left identity) *)
+Theorem add_0_l : forall n : nat, add 0 n = n.
+Proof.
+  intro n.
+  simpl.    (* simplification reduces add 0 n to n *)
+  reflexivity.
+Qed.
+(* Prove: n + 0 = n (right identity, requires induction) *)
+Theorem add_0_r : forall n : nat, add n 0 = n.
+Proof.
+  intro n.
+  induction n as [| n' IHn'].
+  - (* Base case: n = 0 *)
+    simpl. reflexivity.
+  - (* Inductive step: n = S n' *)
+    simpl.               (* add (S n') 0 = S (add n' 0) *)
+    rewrite IHn'.        (* apply induction hypothesis *)
+    reflexivity.
+Qed.
+(* Prove associativity of addition *)
+Theorem add_assoc : forall a b c : nat,
+  add a (add b c) = add (add a b) c.
+Proof.
+  intros a b c.
+  induction a as [| a' IHa'].
+  - simpl. reflexivity.
+  - simpl. rewrite IHa'. reflexivity.
+Qed.
+```
+### Isabelle/HOL
+```isabelle
+theory SimpleVerification
+  imports Main
+begin
+(* Define a recursive function *)
+fun fib :: "nat => nat" where
+  "fib 0 = 0"
+| "fib (Suc 0) = 1"
+| "fib (Suc (Suc n)) = fib (Suc n) + fib n"
+(* Prove a property *)
+lemma fib_positive: "0 < fib (Suc n)"
+  by (induction n rule: fib.induct) auto
+(* Verify a sorting algorithm *)
+fun insert :: "nat => nat list => nat list" where
+  "insert x [] = [x]"
+| "insert x (y # ys) = (if x <= y then x # y # ys else y # insert x ys)"
+fun isort :: "nat list => nat list" where
+  "isort [] = []"
+| "isort (x # xs) = insert x (isort xs)"
+(* Prove the output is sorted *)
+lemma sorted_insert: "sorted (insert x xs) = sorted xs"
+  sorry (* full proof requires additional lemmas *)
+end
+```
+## SMT Solving
+### Z3 for Program Verification
+```python
+from z3 import Solver, Int, Bool, And, Or, Not, Implies, ForAll, sat, unsat
+def verify_array_bounds():
+    """
+    Verify that an array access is always within bounds.
+    Model a loop: for i = 0 to n-1, access a[i].
+    """
+    s = Solver()
+    n = Int("n")
+    i = Int("i")
+    # Precondition: n > 0
+    s.add(n > 0)
+    # Loop invariant: 0 <= i < n at each access
+    s.add(i >= 0)
+    s.add(i < n)
+    # Verify: the access a[i] is within bounds [0, n)
+    s.add(Not(And(i >= 0, i < n)))  # try to find a violation
+    result = s.check()
+    if result == unsat:
+        return "VERIFIED: array access is always within bounds"
+    else:
+        return f"COUNTEREXAMPLE: {s.model()}"
+def verify_integer_overflow():
+    """
+    Check if integer addition can overflow for given constraints.
+    """
+    from z3 import BitVec, BitVecVal
+    s = Solver()
+    # 32-bit signed integers
+    x = BitVec("x", 32)
+    y = BitVec("y", 32)
+    # Preconditions: both positive
+    s.add(x > 0)
+    s.add(y > 0)
+    # Check: can x + y wrap around to negative?
+    s.add(x + y < 0)
+    if s.check() == sat:
+        m = s.model()
+        return {
+            "overflow_possible": True,
+            "x": m[x].as_long(),
+            "y": m[y].as_long(),
+        }
+    return {"overflow_possible": False}
+```
+## Model Checking with SPIN
+### Promela Specification
+```promela
+/* Mutual exclusion with Peterson's algorithm */
+bool flag[2] = false;
+byte turn = 0;
+byte critical = 0;  /* count of processes in critical section */
+active [2] proctype process() {
+    byte me = _pid;
+    byte other = 1 - _pid;
+    do
+    :: /* Entry protocol */
+       flag[me] = true;
+       turn = other;
+       (flag[other] == false || turn == me);
+       /* Critical section */
+       critical++;
+       assert(critical == 1);  /* mutual exclusion */
+       critical--;
+       /* Exit protocol */
+       flag[me] = false;
+    od
+}
+/* LTL property: mutual exclusion always holds */
+ltl mutex { [] (critical <= 1) }
+```
+## Verification Workflow
+### Practical Verification Strategy
+1. **Specify**: Write a formal specification of the desired property
+2. **Model**: Create an abstract model of the system
+3. **Verify**: Run model checker or construct proof
+4. **Refine**: If counterexample found, fix the design or refine the model
+5. **Extract**: Generate verified code from the proof (Coq extraction, Isabelle code generation)
+### Common Properties to Verify
+| Property Type | Example | Specification Pattern |
+|--------------|---------|----------------------|
+| Safety | "No two processes in critical section" | `[] (count <= 1)` |
+| Liveness | "Every request is eventually served" | `[] (request -> <> response)` |
+| Deadlock freedom | "System always has an enabled transition" | `[] <> enabled` |
+| Termination | "Program always halts" | Well-founded ordering |
+## Tools and Resources
+- **TLA+ Toolbox**: IDE for TLA+ with integrated TLC model checker
+- **Coq**: Interactive theorem prover with program extraction
+- **Isabelle/HOL**: Higher-order logic prover with Sledgehammer automation
+- **Z3 / CVC5**: SMT solvers for automated reasoning
+- **SPIN**: Model checker for concurrent systems (Promela)
+- **CBMC**: Bounded model checker for C programs
+- **Dafny**: Verification-aware programming language (Microsoft)
+- **Lean 4**: Modern theorem prover and programming language

package/skills/domains/ecology/species-distribution-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,343 @@
+---
+name: species-distribution-guide
+description: "Species distribution modeling with MaxEnt, SDM methods, and GBIF data"
+metadata:
+  openclaw:
+    emoji: "paw-prints"
+    category: "domains"
+    subcategory: "ecology"
+    keywords: ["species-distribution", "maxent", "sdm", "gbif", "ecological-niche", "biodiversity", "habitat"]
+    source: "wentor"
+---
+# Species Distribution Modeling Guide
+A skill for building and evaluating species distribution models (SDMs), covering occurrence data acquisition from biodiversity databases, environmental predictor preparation, model fitting with MaxEnt and ensemble methods, model evaluation, and projection under climate change scenarios.
+## Occurrence Data
+### Accessing GBIF Data
+The Global Biodiversity Information Facility (GBIF) is the primary source of species occurrence records:
+```python
+from pygbif import occurrences, species
+def download_occurrences(species_name: str, country: str = None,
+                          limit: int = 5000,
+                          has_coordinate: bool = True) -> dict:
+    """
+    Download species occurrence records from GBIF.
+    species_name: scientific name (e.g., 'Panthera tigris')
+    Returns cleaned occurrence records with coordinates.
+    """
+    # Get GBIF species key
+    name_result = species.name_backbone(name=species_name)
+    if "usageKey" not in name_result:
+        return {"error": f"Species not found: {species_name}"}
+    species_key = name_result["usageKey"]
+    # Search occurrences
+    params = {
+        "taxonKey": species_key,
+        "hasCoordinate": has_coordinate,
+        "hasGeospatialIssue": False,
+        "limit": limit,
+    }
+    if country:
+        params["country"] = country
+    results = occurrences.search(**params)
+    # Clean records
+    records = []
+    seen_coords = set()
+    for rec in results.get("results", []):
+        lat = rec.get("decimalLatitude")
+        lon = rec.get("decimalLongitude")
+        if lat is None or lon is None:
+            continue
+        # Remove exact duplicates
+        coord_key = (round(lat, 4), round(lon, 4))
+        if coord_key in seen_coords:
+            continue
+        seen_coords.add(coord_key)
+        records.append({
+            "species": rec.get("species", species_name),
+            "latitude": lat,
+            "longitude": lon,
+            "year": rec.get("year"),
+            "basis_of_record": rec.get("basisOfRecord"),
+            "institution": rec.get("institutionCode"),
+            "country": rec.get("country"),
+        })
+    return {
+        "species": species_name,
+        "gbif_key": species_key,
+        "n_records": len(records),
+        "records": records,
+    }
+```
+### Data Cleaning for SDM
+```python
+import pandas as pd
+import numpy as np
+def clean_occurrences(records: pd.DataFrame,
+                       study_extent: dict = None,
+                       thin_distance_km: float = 10.0) -> pd.DataFrame:
+    """
+    Clean occurrence records for species distribution modeling.
+    Removes outliers, duplicates, and applies spatial thinning.
+    study_extent: {min_lon, max_lon, min_lat, max_lat}
+    thin_distance_km: minimum distance between retained points
+    """
+    df = records.copy()
+    # Remove records with missing coordinates
+    df = df.dropna(subset=["latitude", "longitude"])
+    # Remove records at (0,0) -- common data error
+    df = df[~((df.latitude == 0) & (df.longitude == 0))]
+    # Clip to study extent
+    if study_extent:
+        df = df[
+            (df.longitude >= study_extent["min_lon"]) &
+            (df.longitude <= study_extent["max_lon"]) &
+            (df.latitude >= study_extent["min_lat"]) &
+            (df.latitude <= study_extent["max_lat"])
+        ]
+    # Spatial thinning (grid-based)
+    # Convert thinning distance to approximate degrees
+    thin_deg = thin_distance_km / 111.0
+    df["grid_x"] = (df.longitude / thin_deg).astype(int)
+    df["grid_y"] = (df.latitude / thin_deg).astype(int)
+    df = df.drop_duplicates(subset=["grid_x", "grid_y"])
+    df = df.drop(columns=["grid_x", "grid_y"])
+    return df.reset_index(drop=True)
+```
+## Environmental Predictors
+### WorldClim Bioclimatic Variables
+The standard predictor set for SDMs:
+| Variable | Description | Unit |
+|----------|-------------|------|
+| BIO1 | Annual Mean Temperature | C x 10 |
+| BIO2 | Mean Diurnal Range | C x 10 |
+| BIO4 | Temperature Seasonality | SD x 100 |
+| BIO5 | Max Temperature of Warmest Month | C x 10 |
+| BIO6 | Min Temperature of Coldest Month | C x 10 |
+| BIO12 | Annual Precipitation | mm |
+| BIO13 | Precipitation of Wettest Month | mm |
+| BIO14 | Precipitation of Driest Month | mm |
+| BIO15 | Precipitation Seasonality | CV |
+### Extracting Environmental Values
+```python
+import rasterio
+from rasterio.sample import sample_gen
+def extract_environmental_values(occurrence_coords: np.ndarray,
+                                   raster_paths: dict) -> pd.DataFrame:
+    """
+    Extract environmental variable values at occurrence locations.
+    occurrence_coords: array of (longitude, latitude) pairs
+    raster_paths: {variable_name: filepath} for each predictor raster
+    """
+    env_data = {}
+    for var_name, raster_path in raster_paths.items():
+        with rasterio.open(raster_path) as src:
+            values = []
+            for lon, lat in occurrence_coords:
+                row, col = src.index(lon, lat)
+                if 0 <= row < src.height and 0 <= col < src.width:
+                    values.append(float(src.read(1)[row, col]))
+                else:
+                    values.append(np.nan)
+            env_data[var_name] = values
+    df = pd.DataFrame(env_data)
+    df["longitude"] = occurrence_coords[:, 0]
+    df["latitude"] = occurrence_coords[:, 1]
+    # Remove points with nodata values
+    df = df.replace(src.nodata, np.nan).dropna()
+    return df
+```
+## Model Fitting
+### MaxEnt (Maximum Entropy)
+MaxEnt is the most widely used SDM algorithm for presence-only data:
+```python
+import subprocess
+def run_maxent(samples_csv: str, env_layers_dir: str,
+                output_dir: str, features: str = "auto",
+                regularization: float = 1.0,
+                n_background: int = 10000) -> dict:
+    """
+    Run MaxEnt species distribution model.
+    samples_csv: CSV with columns species, longitude, latitude
+    env_layers_dir: directory containing .asc raster files
+    output_dir: directory for model outputs
+    """
+    cmd = [
+        "java", "-jar", "maxent.jar",
+        "-s", samples_csv,
+        "-e", env_layers_dir,
+        "-o", output_dir,
+        f"betamultiplier={regularization}",
+        f"maximumbackground={n_background}",
+        "responsecurves=true",
+        "jackknife=true",
+        "writeplotdata=true",
+        "autorun=true",
+    ]
+    result = subprocess.run(cmd, capture_output=True, text=True)
+    # Parse results from maxentResults.csv
+    import csv
+    results_file = f"{output_dir}/maxentResults.csv"
+    with open(results_file) as f:
+        reader = csv.DictReader(f)
+        row = next(reader)
+    return {
+        "training_auc": float(row.get("Training AUC", 0)),
+        "test_auc": float(row.get("Test AUC", 0)),
+        "n_training": int(row.get("X Training samples", 0)),
+        "regularized_gain": float(row.get("Regularized training gain", 0)),
+        "important_variables": row.get("Percent contribution", ""),
+    }
+```
+### Ensemble SDM with Python
+```python
+from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
+from sklearn.linear_model import LogisticRegression
+from sklearn.model_selection import cross_val_predict
+from sklearn.metrics import roc_auc_score
+def ensemble_sdm(presence_env: pd.DataFrame,
+                  background_env: pd.DataFrame,
+                  predictor_cols: list[str]) -> dict:
+    """
+    Build an ensemble SDM from multiple algorithms.
+    presence_env: environmental values at presence points
+    background_env: environmental values at background/pseudo-absence points
+    """
+    # Prepare data
+    X_pres = presence_env[predictor_cols].values
+    X_bg = background_env[predictor_cols].values
+    X = np.vstack([X_pres, X_bg])
+    y = np.concatenate([np.ones(len(X_pres)), np.zeros(len(X_bg))])
+    models = {
+        "random_forest": RandomForestClassifier(n_estimators=500, max_depth=10),
+        "gbm": GradientBoostingClassifier(n_estimators=300, max_depth=5,
+                                            learning_rate=0.05),
+        "logistic": LogisticRegression(max_iter=1000),
+    }
+    results = {}
+    predictions = {}
+    for name, model in models.items():
+        # Cross-validated predictions
+        cv_pred = cross_val_predict(model, X, y, cv=5, method="predict_proba")[:, 1]
+        auc = roc_auc_score(y, cv_pred)
+        model.fit(X, y)
+        results[name] = {"auc": round(auc, 4), "model": model}
+        predictions[name] = cv_pred
+    # Weighted ensemble (weight by AUC)
+    total_auc = sum(r["auc"] for r in results.values())
+    ensemble_pred = sum(
+        predictions[name] * results[name]["auc"] / total_auc
+        for name in models
+    )
+    ensemble_auc = roc_auc_score(y, ensemble_pred)
+    results["ensemble"] = {"auc": round(ensemble_auc, 4)}
+    return results
+```
+## Model Evaluation
+### Evaluation Metrics for SDMs
+| Metric | Range | Interpretation |
+|--------|-------|---------------|
+| AUC | 0-1 | Discrimination ability (>0.7 useful, >0.8 good) |
+| TSS (True Skill Statistic) | -1 to 1 | Sensitivity + Specificity - 1 |
+| Boyce Index | -1 to 1 | Predicted-to-expected ratio consistency |
+| Kappa | -1 to 1 | Agreement beyond chance |
+```python
+def compute_tss(y_true: np.ndarray, y_pred_proba: np.ndarray) -> dict:
+    """
+    Compute TSS (True Skill Statistic) at the optimal threshold.
+    TSS = Sensitivity + Specificity - 1
+    """
+    from sklearn.metrics import roc_curve
+    fpr, tpr, thresholds = roc_curve(y_true, y_pred_proba)
+    specificity = 1 - fpr
+    tss_values = tpr + specificity - 1
+    optimal_idx = np.argmax(tss_values)
+    return {
+        "tss": round(tss_values[optimal_idx], 4),
+        "optimal_threshold": round(thresholds[optimal_idx], 4),
+        "sensitivity": round(tpr[optimal_idx], 4),
+        "specificity": round(specificity[optimal_idx], 4),
+    }
+```
+## Climate Change Projections
+### Projecting Habitat Shifts
+SDMs can project future suitable habitat under climate scenarios:
+1. Fit model on current climate + occurrences
+2. Obtain future climate rasters (CMIP6 SSP scenarios)
+3. Predict suitability on future climate surfaces
+4. Compare current vs future range to quantify shifts
+Key considerations:
+- Use multiple GCMs to capture model uncertainty
+- Apply clamping for novel climate combinations
+- Report range change metrics: area gained, area lost, centroid shift
+## Tools and Resources
+- **MaxEnt**: Maximum entropy SDM (Java, most cited SDM software)
+- **biomod2 (R)**: Ensemble SDM framework with 10+ algorithms
+- **Wallace (R Shiny)**: Interactive SDM workflow application
+- **pygbif / rgbif**: GBIF data access from Python/R
+- **rasterio / terra**: Raster data handling
+- **WorldClim (worldclim.org)**: Global climate data at 1km resolution
+- **CHELSA**: High-resolution climate data (better for mountainous regions)
+- **eBird**: Citizen science bird occurrence data (Cornell Lab)