npm - @wentorai/research-plugins - Versions diffs - 1.0.0 → 1.1.0 - Mend

@wentorai/research-plugins 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (203) hide show

package/skills/domains/chemistry/spectroscopy-analysis-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,290 @@
+---
+name: spectroscopy-analysis-guide
+description: "Spectral data analysis for NMR, IR, mass spectrometry, and UV-Vis"
+metadata:
+  openclaw:
+    emoji: "microscope"
+    category: "domains"
+    subcategory: "chemistry"
+    keywords: ["spectroscopy", "nmr", "mass-spectrometry", "infrared", "uv-vis", "analytical-chemistry"]
+    source: "wentor"
+---
+# Spectroscopy Analysis Guide
+A skill for processing and interpreting spectroscopic data in chemistry research. Covers NMR, IR, mass spectrometry, and UV-Vis spectroscopy including data formats, baseline correction, peak detection, spectral matching, and structure elucidation workflows.
+## Spectral Data Formats
+### Common File Formats
+| Format | Spectroscopy | Description |
+|--------|-------------|-------------|
+| JCAMP-DX (.jdx, .dx) | All types | IUPAC standard exchange format |
+| Bruker (1r, fid, acqu) | NMR | Raw and processed Bruker data |
+| mzML / mzXML | MS | Open mass spectrometry format |
+| SPC (.spc) | IR, UV-Vis | Galactic/Thermo spectral format |
+| CSV / TXT | All | Simple x,y pairs (wavelength/wavenumber, intensity) |
+### Reading Spectral Data
+```python
+import numpy as np
+from scipy.signal import find_peaks, savgol_filter
+def read_jcamp(filepath: str) -> dict:
+    """
+    Read a JCAMP-DX spectral file.
+    Returns x (wavenumber/chemical shift/m/z) and y (intensity) arrays.
+    """
+    x_data, y_data = [], []
+    metadata = {}
+    with open(filepath, "r") as f:
+        for line in f:
+            line = line.strip()
+            if line.startswith("##"):
+                key_val = line[2:].split("=", 1)
+                if len(key_val) == 2:
+                    metadata[key_val[0].strip()] = key_val[1].strip()
+            elif line and not line.startswith("$$"):
+                parts = line.split()
+                try:
+                    values = [float(v) for v in parts]
+                    if len(values) >= 2:
+                        x_data.append(values[0])
+                        y_data.extend(values[1:])
+                except ValueError:
+                    continue
+    return {
+        "x": np.array(x_data),
+        "y": np.array(y_data[:len(x_data)]),
+        "metadata": metadata,
+    }
+```
+## NMR Spectroscopy
+### 1H NMR Processing
+```python
+import nmrglue as ng
+def process_1h_nmr(bruker_dir: str) -> dict:
+    """
+    Process 1H NMR data from Bruker format using nmrglue.
+    bruker_dir: path to Bruker experiment directory
+    """
+    # Read raw data
+    dic, data = ng.bruker.read(bruker_dir)
+    # Apply processing
+    data = ng.bruker.remove_digital_filter(dic, data)
+    data = ng.proc_base.zf_size(data, 65536)     # zero-fill
+    data = ng.proc_base.fft(data)                  # Fourier transform
+    data = ng.proc_autophase.autops(data, "acme")  # automatic phasing
+    data = ng.proc_base.rev(data)                  # reverse spectrum
+    data = ng.proc_base.di(data)                   # discard imaginary
+    # Generate chemical shift axis (ppm)
+    udic = ng.bruker.guess_udic(dic, data)
+    uc = ng.fileiobase.uc_from_udic(udic)
+    ppm = uc.ppm_scale()
+    return {
+        "ppm": ppm,
+        "spectrum": data.real,
+        "sf": dic["acqus"]["SFO1"],       # spectrometer frequency (MHz)
+        "sw_ppm": dic["acqus"]["SW"],       # sweep width (ppm)
+    }
+def pick_nmr_peaks(ppm: np.ndarray, spectrum: np.ndarray,
+                    threshold: float = 0.05) -> list[dict]:
+    """
+    Automatic peak picking for 1H NMR.
+    threshold: minimum peak height as fraction of max intensity.
+    """
+    min_height = threshold * np.max(spectrum)
+    indices, properties = find_peaks(
+        spectrum, height=min_height, distance=10, prominence=min_height * 0.5
+    )
+    peaks = []
+    for idx in indices:
+        peaks.append({
+            "ppm": round(float(ppm[idx]), 3),
+            "intensity": float(spectrum[idx]),
+        })
+    # Sort by chemical shift (high to low, NMR convention)
+    peaks.sort(key=lambda p: p["ppm"], reverse=True)
+    return peaks
+```
+### Common 1H NMR Chemical Shift Ranges
+| Chemical Shift (ppm) | Functional Group |
+|----------------------|-----------------|
+| 0.8-1.0 | CH3 (methyl, alkyl) |
+| 1.2-1.4 | CH2 (methylene, alkyl chain) |
+| 2.0-2.5 | CH next to C=O |
+| 3.3-3.9 | CH next to O or N (ethers, amines) |
+| 4.5-5.5 | Vinyl C=CH2, OCH |
+| 6.5-8.5 | Aromatic H |
+| 9.0-10.0 | Aldehyde CHO |
+| 10.0-12.0 | Carboxylic acid OH |
+## Mass Spectrometry
+### Processing MS Data
+```python
+from pyteomics import mzml
+import numpy as np
+def read_mzml_spectra(filepath: str, ms_level: int = 1) -> list[dict]:
+    """
+    Read mass spectra from an mzML file.
+    ms_level: 1 for MS1 (survey scans), 2 for MS/MS
+    """
+    spectra = []
+    with mzml.read(filepath) as reader:
+        for spectrum in reader:
+            if spectrum.get("ms level") == ms_level:
+                spectra.append({
+                    "scan": spectrum["index"],
+                    "rt": spectrum["scanList"]["scan"][0].get(
+                        "scan start time", 0
+                    ),
+                    "mz": spectrum["m/z array"],
+                    "intensity": spectrum["intensity array"],
+                    "tic": np.sum(spectrum["intensity array"]),
+                })
+    return spectra
+def find_molecular_ion(mz: np.ndarray, intensity: np.ndarray,
+                        expected_mw: float = None,
+                        tolerance_da: float = 0.5) -> list[dict]:
+    """
+    Identify molecular ion peaks ([M+H]+, [M+Na]+, [M-H]-).
+    """
+    # Find top peaks
+    top_indices = np.argsort(intensity)[::-1][:20]
+    candidates = []
+    adducts = {
+        "[M+H]+": 1.00728,
+        "[M+Na]+": 22.98922,
+        "[M+K]+": 38.96316,
+        "[M-H]-": -1.00728,
+        "[M+NH4]+": 18.03437,
+    }
+    for idx in top_indices:
+        peak_mz = mz[idx]
+        peak_int = intensity[idx]
+        if expected_mw:
+            for adduct_name, adduct_mass in adducts.items():
+                calc_mw = peak_mz - adduct_mass
+                if abs(calc_mw - expected_mw) < tolerance_da:
+                    candidates.append({
+                        "mz": round(float(peak_mz), 4),
+                        "intensity": float(peak_int),
+                        "adduct": adduct_name,
+                        "calc_mw": round(calc_mw, 4),
+                        "error_da": round(abs(calc_mw - expected_mw), 4),
+                    })
+        else:
+            candidates.append({
+                "mz": round(float(peak_mz), 4),
+                "intensity": float(peak_int),
+            })
+    return candidates
+```
+## Infrared Spectroscopy
+### IR Peak Assignment
+```python
+# Standard IR functional group frequency table
+IR_ASSIGNMENTS = {
+    (3200, 3600): "O-H stretch (broad: alcohol, acid; sharp: free OH)",
+    (3300, 3500): "N-H stretch (primary amine: 2 bands; secondary: 1 band)",
+    (2850, 3000): "C-H stretch (sp3: 2850-2960; sp2: 3000-3100)",
+    (2100, 2260): "Triple bond stretch (C-triple-N: 2210-2260; C-triple-C: 2100-2150)",
+    (1680, 1750): "C=O stretch (ketone ~1715; ester ~1735; acid ~1710; amide ~1650)",
+    (1600, 1680): "C=C stretch (alkene ~1640; aromatic ~1600, 1500)",
+    (1000, 1300): "C-O stretch (ether, ester, alcohol)",
+}
+def assign_ir_peaks(wavenumber: np.ndarray, absorbance: np.ndarray,
+                     threshold: float = 0.1) -> list[dict]:
+    """Detect and assign IR absorption peaks to functional groups."""
+    # Invert for peak detection (absorbance peaks are positive)
+    peaks, properties = find_peaks(absorbance, height=threshold, prominence=0.05)
+    assignments = []
+    for idx in peaks:
+        wn = float(wavenumber[idx])
+        assignment = "unassigned"
+        for (low, high), group in IR_ASSIGNMENTS.items():
+            if low <= wn <= high:
+                assignment = group
+                break
+        assignments.append({
+            "wavenumber_cm-1": round(wn, 1),
+            "absorbance": round(float(absorbance[idx]), 4),
+            "assignment": assignment,
+        })
+    return sorted(assignments, key=lambda x: x["wavenumber_cm-1"], reverse=True)
+```
+## Spectral Processing Utilities
+### Baseline Correction and Smoothing
+```python
+def baseline_correction(y: np.ndarray, lam: float = 1e6,
+                         p: float = 0.001, n_iter: int = 10) -> np.ndarray:
+    """
+    Asymmetric least squares baseline correction (Eilers and Boelens, 2005).
+    lam: smoothness parameter (larger = smoother baseline)
+    p: asymmetry parameter (smaller = more emphasis on fitting below peaks)
+    """
+    from scipy.sparse import diags, csc_matrix
+    from scipy.sparse.linalg import spsolve
+    L = len(y)
+    D = diags([1, -2, 1], [0, -1, -2], shape=(L, L - 2)).toarray()
+    H = lam * D.dot(D.T)
+    w = np.ones(L)
+    for _ in range(n_iter):
+        W = diags(w, 0, shape=(L, L))
+        Z = csc_matrix(W + H)
+        baseline = spsolve(Z, w * y)
+        w = p * (y > baseline) + (1 - p) * (y < baseline)
+    return y - baseline
+def smooth_spectrum(y: np.ndarray, window: int = 11,
+                     polyorder: int = 3) -> np.ndarray:
+    """Apply Savitzky-Golay smoothing to a spectrum."""
+    return savgol_filter(y, window, polyorder)
+```
+## Tools and Software
+- **nmrglue**: Python NMR data processing (Bruker, Varian, Agilent)
+- **pyOpenMS / pyteomics**: Mass spectrometry data processing
+- **RDKit**: Molecular structure to predicted spectra
+- **MestReNova**: Commercial NMR processing (widely used in chemistry labs)
+- **TopSpin (Bruker)**: NMR acquisition and processing
+- **SDBS (AIST)**: Free spectral database (IR, NMR, MS)
+- **MassBank**: Open mass spectral database
+- **NIST Chemistry WebBook**: Reference spectra for IR and MS

package/skills/domains/cs/distributed-systems-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,268 @@
+---
+name: distributed-systems-guide
+description: "Distributed systems design patterns and analysis for CS research"
+metadata:
+  openclaw:
+    emoji: "globe-with-meridians"
+    category: "domains"
+    subcategory: "cs"
+    keywords: ["distributed-systems", "consensus", "replication", "fault-tolerance", "scalability", "cap-theorem"]
+    source: "wentor"
+---
+# Distributed Systems Guide
+A skill for researching and designing distributed systems, covering consensus algorithms, replication strategies, consistency models, fault tolerance, and performance analysis. Provides theoretical foundations and practical implementations relevant to systems research.
+## Consistency Models
+### Consistency Hierarchy
+```
+Strongest
+  |  Linearizability (atomic, real-time ordering)
+  |  Sequential consistency (program order respected)
+  |  Causal consistency (causally related ops ordered)
+  |  PRAM / FIFO consistency (per-process order)
+  |  Eventual consistency (converges if updates stop)
+Weakest
+```
+### CAP Theorem and PACELC
+The CAP theorem states that during a network partition, a distributed system must choose between consistency and availability:
+| System | Partition Behavior | Normal Behavior | Classification |
+|--------|-------------------|----------------|----------------|
+| ZooKeeper | Consistent (sacrifice A) | Low latency, consistent | CP / PC/EC |
+| Cassandra | Available (sacrifice C) | Low latency, eventual | AP / PA/EL |
+| Spanner | Consistent (sacrifice A) | Higher latency, consistent | CP / PC/EC |
+| DynamoDB | Configurable per-read | Tunable consistency | AP or CP |
+| CockroachDB | Consistent (sacrifice A) | Serializable | CP / PC/EC |
+## Consensus Algorithms
+### Raft Implementation Sketch
+```python
+from enum import Enum
+from dataclasses import dataclass, field
+import random
+class NodeState(Enum):
+    FOLLOWER = "follower"
+    CANDIDATE = "candidate"
+    LEADER = "leader"
+@dataclass
+class LogEntry:
+    term: int
+    index: int
+    command: str
+@dataclass
+class RaftNode:
+    """
+    Simplified Raft consensus node for educational purposes.
+    Implements leader election and log replication state machine.
+    """
+    node_id: str
+    state: NodeState = NodeState.FOLLOWER
+    current_term: int = 0
+    voted_for: str = None
+    log: list = field(default_factory=list)
+    commit_index: int = 0
+    last_applied: int = 0
+    # Leader state
+    next_index: dict = field(default_factory=dict)
+    match_index: dict = field(default_factory=dict)
+    def start_election(self, peers: list[str]) -> dict:
+        """Transition to candidate and request votes."""
+        self.state = NodeState.CANDIDATE
+        self.current_term += 1
+        self.voted_for = self.node_id
+        last_log_index = len(self.log) - 1 if self.log else -1
+        last_log_term = self.log[-1].term if self.log else 0
+        return {
+            "type": "RequestVote",
+            "term": self.current_term,
+            "candidate_id": self.node_id,
+            "last_log_index": last_log_index,
+            "last_log_term": last_log_term,
+        }
+    def handle_vote_request(self, term: int, candidate_id: str,
+                              last_log_index: int,
+                              last_log_term: int) -> dict:
+        """Process a RequestVote RPC."""
+        if term < self.current_term:
+            return {"term": self.current_term, "vote_granted": False}
+        if term > self.current_term:
+            self.current_term = term
+            self.state = NodeState.FOLLOWER
+            self.voted_for = None
+        # Check if candidate's log is at least as up-to-date
+        my_last_term = self.log[-1].term if self.log else 0
+        my_last_index = len(self.log) - 1 if self.log else -1
+        log_ok = (last_log_term > my_last_term or
+                  (last_log_term == my_last_term and
+                   last_log_index >= my_last_index))
+        vote_granted = (
+            (self.voted_for is None or self.voted_for == candidate_id)
+            and log_ok
+        )
+        if vote_granted:
+            self.voted_for = candidate_id
+        return {"term": self.current_term, "vote_granted": vote_granted}
+    def append_entry(self, command: str) -> LogEntry:
+        """Leader appends a new entry to its log."""
+        entry = LogEntry(
+            term=self.current_term,
+            index=len(self.log),
+            command=command,
+        )
+        self.log.append(entry)
+        return entry
+```
+### Paxos vs Raft vs PBFT Comparison
+| Algorithm | Fault Model | Tolerance | Rounds | Complexity |
+|-----------|-------------|-----------|--------|------------|
+| Paxos | Crash faults | f < n/2 | 2 (normal) | Difficult to implement correctly |
+| Raft | Crash faults | f < n/2 | 2 (normal) | Designed for understandability |
+| PBFT | Byzantine faults | f < n/3 | 3 | O(n^2) message complexity |
+| HotStuff | Byzantine faults | f < n/3 | 3 | O(n) with pipelining |
+## Replication Strategies
+### State Machine Replication
+```python
+class ReplicatedStateMachine:
+    """
+    State machine replication with configurable consistency.
+    Demonstrates read/write quorum intersection for correctness.
+    """
+    def __init__(self, n_replicas: int, read_quorum: int = None,
+                 write_quorum: int = None):
+        self.n = n_replicas
+        self.R = read_quorum or (n_replicas // 2 + 1)
+        self.W = write_quorum or (n_replicas // 2 + 1)
+        # Quorum intersection guarantees: R + W > N
+        assert self.R + self.W > self.n, (
+            f"Quorum intersection violated: R({self.R}) + W({self.W}) "
+            f"must be > N({self.n})"
+        )
+        self.replicas = [{} for _ in range(n_replicas)]
+        self.version_clock = 0
+    def write(self, key: str, value: str) -> dict:
+        """Write to W replicas."""
+        self.version_clock += 1
+        # Select W replicas (in practice, based on availability)
+        targets = random.sample(range(self.n), self.W)
+        for i in targets:
+            self.replicas[i][key] = (value, self.version_clock)
+        return {
+            "key": key,
+            "version": self.version_clock,
+            "acked_by": len(targets),
+            "quorum_met": True,
+        }
+    def read(self, key: str) -> dict:
+        """Read from R replicas, return latest version."""
+        targets = random.sample(range(self.n), self.R)
+        responses = []
+        for i in targets:
+            if key in self.replicas[i]:
+                responses.append(self.replicas[i][key])
+        if not responses:
+            return {"key": key, "value": None, "found": False}
+        # Return the value with the highest version
+        latest = max(responses, key=lambda x: x[1])
+        return {
+            "key": key,
+            "value": latest[0],
+            "version": latest[1],
+            "found": True,
+        }
+```
+## Clock Synchronization and Ordering
+### Vector Clocks
+```python
+class VectorClock:
+    """Vector clock for tracking causality in distributed systems."""
+    def __init__(self, process_id: str, processes: list[str]):
+        self.pid = process_id
+        self.clock = {p: 0 for p in processes}
+    def increment(self):
+        """Local event: increment own counter."""
+        self.clock[self.pid] += 1
+    def send(self) -> dict:
+        """Prepare clock for sending with a message."""
+        self.increment()
+        return dict(self.clock)
+    def receive(self, other_clock: dict):
+        """Merge received clock: element-wise max, then increment."""
+        for p in self.clock:
+            self.clock[p] = max(self.clock[p], other_clock.get(p, 0))
+        self.increment()
+    def happened_before(self, other: dict) -> bool:
+        """Check if this clock happened-before other (causal ordering)."""
+        return (all(self.clock[p] <= other.get(p, 0) for p in self.clock) and
+                any(self.clock[p] < other.get(p, 0) for p in self.clock))
+```
+## Performance Analysis
+### Latency and Throughput Modeling
+Key metrics for evaluating distributed systems:
+- **Tail latency (p99, p999)**: Critical for real-world SLAs; often dominated by slow replicas
+- **Throughput under contention**: How performance degrades with conflict rate
+- **Scalability**: Linear vs sub-linear throughput increase with added nodes
+- **Recovery time**: Time to restore consistency after node failure
+## Key Research Papers
+- Lamport, L. (1998). The Part-Time Parliament (Paxos). *ACM TOCS*.
+- Ongaro, D. and Ousterhout, J. (2014). In Search of an Understandable Consensus Algorithm (Raft). *USENIX ATC*.
+- Corbett, J. et al. (2013). Spanner: Google's Globally-Distributed Database. *ACM TOCS*.
+- DeCandia, G. et al. (2007). Dynamo: Amazon's Highly Available Key-value Store. *SOSP*.
+## Tools and Frameworks
+- **etcd / ZooKeeper**: Production consensus stores for coordination
+- **Jepsen**: Distributed systems correctness testing framework
+- **TLA+ / PlusCal**: Formal specification and model checking
+- **ns-3 / OMNeT++**: Network simulation for distributed protocols
+- **gRPC / Cap'n Proto**: High-performance RPC frameworks
+- **FoundationDB**: Multi-model distributed database with strong consistency