npm - @wentorai/research-plugins - Versions diffs - 1.0.0 → 1.1.0 - Mend

@wentorai/research-plugins 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (203) hide show

package/skills/domains/physics/particle-physics-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,287 @@
+---
+name: particle-physics-guide
+description: "Particle physics data analysis with ROOT, HEPData, and event processing"
+metadata:
+  openclaw:
+    emoji: "cyclone"
+    category: "domains"
+    subcategory: "physics"
+    keywords: ["particle-physics", "root", "hepdata", "collider", "event-analysis", "high-energy-physics"]
+    source: "wentor"
+---
+# Particle Physics Guide
+A skill for analyzing particle physics data, covering event reconstruction, histogram analysis, statistical methods for discovery, and the standard tools used in high-energy physics (HEP) research. Includes ROOT, uproot, pyhf, and HEPData workflows.
+## Data Formats and Access
+### HEP Data Ecosystem
+| Format | Description | Typical Size | Access Tool |
+|--------|-------------|-------------|-------------|
+| ROOT (.root) | Columnar binary format, HEP standard | GB-TB | ROOT, uproot |
+| NanoAOD | Compact analysis format (CMS) | ~1 KB/event | uproot, coffea |
+| DAOD_PHYS | Derived analysis format (ATLAS) | ~10 KB/event | ROOT, uproot |
+| HepMC | Monte Carlo event record | Variable | pyhepmc |
+| HEPData | Published results (YAML/JSON) | KB | hepdata_lib |
+### Reading ROOT Files with uproot
+```python
+import uproot
+import awkward as ak
+import numpy as np
+def load_nanoaod(filepath: str, tree_name: str = "Events",
+                  branches: list[str] = None) -> ak.Array:
+    """
+    Load a NanoAOD ROOT file into an awkward array.
+    branches: list of branch names to load (None = all)
+    """
+    with uproot.open(filepath) as f:
+        tree = f[tree_name]
+        if branches is None:
+            branches = tree.keys()
+        events = tree.arrays(branches, library="ak")
+    print(f"Loaded {len(events)} events")
+    print(f"Branches: {events.fields}")
+    return events
+# Example: Load muon data
+events = load_nanoaod("nano_data.root", branches=[
+    "nMuon", "Muon_pt", "Muon_eta", "Muon_phi", "Muon_mass",
+    "Muon_charge", "Muon_pfRelIso04_all", "Muon_tightId",
+])
+```
+## Event Selection and Reconstruction
+### Dimuon Invariant Mass
+```python
+def compute_invariant_mass(pt1, eta1, phi1, mass1,
+                            pt2, eta2, phi2, mass2):
+    """
+    Compute invariant mass of a particle pair from 4-momentum components.
+    Uses the relativistic energy-momentum relation.
+    """
+    # Convert to Cartesian 4-vectors
+    px1 = pt1 * np.cos(phi1)
+    py1 = pt1 * np.sin(phi1)
+    pz1 = pt1 * np.sinh(eta1)
+    e1 = np.sqrt(px1**2 + py1**2 + pz1**2 + mass1**2)
+    px2 = pt2 * np.cos(phi2)
+    py2 = pt2 * np.sin(phi2)
+    pz2 = pt2 * np.sinh(eta2)
+    e2 = np.sqrt(px2**2 + py2**2 + pz2**2 + mass2**2)
+    # Invariant mass of the pair
+    m_inv = np.sqrt(
+        (e1 + e2)**2 - (px1 + px2)**2 - (py1 + py2)**2 - (pz1 + pz2)**2
+    )
+    return m_inv
+def select_z_candidates(events):
+    """
+    Select Z -> mu+mu- candidates from NanoAOD events.
+    Requires exactly 2 opposite-sign muons passing quality cuts.
+    """
+    # Quality cuts
+    muon_mask = (
+        (events.Muon_pt > 20) &            # pT > 20 GeV
+        (abs(events.Muon_eta) < 2.4) &     # |eta| < 2.4
+        (events.Muon_tightId == True) &     # tight muon ID
+        (events.Muon_pfRelIso04_all < 0.15) # relative isolation
+    )
+    # Apply mask and require exactly 2 muons
+    good_muons = events[muon_mask]
+    dimuon_events = good_muons[ak.num(good_muons.Muon_pt) == 2]
+    # Opposite sign requirement
+    opposite_sign = (
+        dimuon_events.Muon_charge[:, 0] * dimuon_events.Muon_charge[:, 1] < 0
+    )
+    z_candidates = dimuon_events[opposite_sign]
+    # Compute invariant mass
+    m_inv = compute_invariant_mass(
+        z_candidates.Muon_pt[:, 0], z_candidates.Muon_eta[:, 0],
+        z_candidates.Muon_phi[:, 0], z_candidates.Muon_mass[:, 0],
+        z_candidates.Muon_pt[:, 1], z_candidates.Muon_eta[:, 1],
+        z_candidates.Muon_phi[:, 1], z_candidates.Muon_mass[:, 1],
+    )
+    return m_inv
+```
+## Statistical Methods for Discovery
+### Hypothesis Testing with pyhf
+```python
+import pyhf
+def build_counting_model(signal: float, background: float,
+                          bkg_uncertainty: float) -> dict:
+    """
+    Build a simple counting experiment model in pyhf.
+    signal: expected signal yield
+    background: expected background yield
+    bkg_uncertainty: relative uncertainty on background
+    """
+    model = pyhf.simplemodels.uncorrelated_background(
+        signal=[signal],
+        bkg=[background],
+        bkg_uncertainty=[bkg_uncertainty * background],
+    )
+    # Observed data (background-only for expected limit)
+    data = [background] + model.config.auxdata
+    return {"model": model, "data": data}
+def compute_cls(model, data, poi_values=None):
+    """
+    Compute CLs exclusion limits (frequentist hypothesis test).
+    Uses the CLs method standard in HEP.
+    """
+    if poi_values is None:
+        poi_values = np.linspace(0, 5, 50)
+    obs_cls = []
+    exp_cls = []
+    for mu in poi_values:
+        result = pyhf.infer.hypotest(
+            mu, data, model["model"],
+            test_stat="qtilde",
+            return_expected_set=True,
+        )
+        obs_cls.append(float(result[0]))
+        exp_cls.append([float(v) for v in result[1]])
+    return {
+        "poi_values": poi_values.tolist(),
+        "observed_cls": obs_cls,
+        "expected_cls": exp_cls,
+    }
+```
+### Discovery Significance
+```python
+def discovery_significance(n_observed: float, n_background: float,
+                            sigma_b: float = 0) -> dict:
+    """
+    Compute discovery significance for a counting experiment.
+    n_observed: number of observed events
+    n_background: expected background
+    sigma_b: uncertainty on background
+    """
+    from scipy.stats import norm
+    if sigma_b == 0:
+        # Simple Poisson significance
+        # Z = sqrt(2 * (n * ln(n/b) - (n - b)))
+        if n_observed <= n_background:
+            z = 0
+        else:
+            z = np.sqrt(2 * (
+                n_observed * np.log(n_observed / n_background)
+                - (n_observed - n_background)
+            ))
+    else:
+        # With systematic uncertainty (profile likelihood approximation)
+        tau = n_background / sigma_b**2
+        n = n_observed
+        b = n_background
+        z = np.sqrt(2 * (
+            n * np.log((n * (b + tau)) / (b**2 + n * tau))
+            - (b**2 / tau) * np.log(1 + tau * (n - b) / (b * (b + tau)))
+        ))
+    p_value = 1 - norm.cdf(z)
+    return {
+        "z_significance": round(z, 4),
+        "p_value": p_value,
+        "is_evidence": z >= 3.0,       # 3 sigma = evidence
+        "is_discovery": z >= 5.0,      # 5 sigma = discovery
+    }
+```
+## Histogram Analysis
+### Binned Fitting
+```python
+from scipy.optimize import curve_fit
+def fit_breit_wigner_plus_bg(bin_centers: np.ndarray,
+                               bin_contents: np.ndarray,
+                               mass_range: tuple = (80, 100)) -> dict:
+    """
+    Fit a Breit-Wigner (resonance) + polynomial background to a mass histogram.
+    Standard approach for Z boson mass measurement.
+    """
+    def model(m, N_sig, M_Z, Gamma_Z, a0, a1):
+        # Breit-Wigner
+        bw = N_sig * Gamma_Z / (2 * np.pi) / (
+            (m - M_Z)**2 + (Gamma_Z / 2)**2
+        )
+        # Linear background
+        bg = a0 + a1 * (m - 91.0)
+        return bw + bg
+    mask = (bin_centers >= mass_range[0]) & (bin_centers <= mass_range[1])
+    x = bin_centers[mask]
+    y = bin_contents[mask]
+    p0 = [1000, 91.2, 2.5, 10, 0]  # initial guess
+    popt, pcov = curve_fit(model, x, y, p0=p0, sigma=np.sqrt(y + 1))
+    perr = np.sqrt(np.diag(pcov))
+    return {
+        "M_Z": f"{popt[1]:.3f} +/- {perr[1]:.3f} GeV",
+        "Gamma_Z": f"{popt[2]:.3f} +/- {perr[2]:.3f} GeV",
+        "N_signal": f"{popt[0]:.0f} +/- {perr[0]:.0f}",
+        "chi2_ndf": round(np.sum(((y - model(x, *popt))**2 / (y + 1))) / (len(x) - 5), 2),
+    }
+```
+## Monte Carlo Simulation
+### Event Generation Pipeline
+```
+1. Matrix element calculation (MadGraph, Sherpa, POWHEG)
+   --> Hard scattering process (e.g., pp -> Z -> mu+mu-)
+2. Parton shower (Pythia, Herwig)
+   --> QCD radiation, initial/final state radiation
+3. Hadronization (Pythia string model, Herwig cluster model)
+   --> Quarks/gluons -> hadrons
+4. Detector simulation (Geant4 via CMSSW/Athena, or Delphes for fast sim)
+   --> Particle interactions with detector material
+5. Reconstruction
+   --> Raw hits -> tracks, clusters, physics objects
+```
+## Tools and Software
+- **ROOT**: C++ data analysis framework (CERN), ubiquitous in HEP
+- **uproot**: Pure Python ROOT file reader (no ROOT dependency)
+- **awkward-array**: Columnar data with variable-length nested structure
+- **coffea**: Analysis framework built on uproot + awkward + dask
+- **pyhf**: Pure Python HistFactory for statistical models
+- **MadGraph5_aMC@NLO**: Automated matrix element generation
+- **Pythia 8**: Monte Carlo event generator (parton shower + hadronization)
+- **Delphes**: Fast detector simulation framework
+- **HEPData**: Repository for published HEP measurements

package/skills/domains/social-science/network-analysis-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,310 @@
+---
+name: network-analysis-guide
+description: "Social network analysis methods, metrics, and visualization tools"
+metadata:
+  openclaw:
+    emoji: "globe_with_meridians"
+    category: "domains"
+    subcategory: "social-science"
+    keywords: ["social network analysis", "graph theory", "centrality", "community detection", "network visualization", "SNA"]
+    source: "wentor-research-plugins"
+---
+# Network Analysis Guide
+A skill for conducting social network analysis (SNA) in research contexts. Covers network data collection and representation, key structural metrics (centrality, density, clustering), community detection algorithms, ego network analysis, longitudinal network models, and visualization best practices using Python NetworkX, igraph, and Gephi.
+## Network Data Fundamentals
+### Representing Network Data
+Networks consist of nodes (actors) and edges (relationships). The first decision in any SNA project is how to represent the data.
+```
+Network data formats:
+Edge List (simplest):
+  source, target, weight
+  Alice, Bob, 3
+  Alice, Carol, 1
+  Bob, David, 5
+Adjacency Matrix (for small networks):
+        Alice  Bob  Carol  David
+  Alice   0     3    1      0
+  Bob     3     0    0      5
+  Carol   1     0    0      0
+  David   0     5    0      0
+Network types:
+  Undirected: friendship, co-authorship, physical contact
+  Directed: email, citation, following on social media
+  Weighted: frequency of interaction, strength of tie
+  Bipartite: two types of nodes (e.g., people and events)
+  Multiplex: multiple types of edges between same nodes
+  Temporal: edges have timestamps or time windows
+```
+### Data Collection Methods
+```
+Common SNA data collection approaches:
+Survey-based (name generators):
+  "List up to 5 people you go to for work advice."
+  Advantages: captures subjective relationship perception
+  Limitations: recall bias, boundary specification problem
+  Best for: organizational networks, personal networks
+Archival data:
+  Email logs, collaboration records, co-authorship
+  Advantages: objective, complete within data boundaries
+  Limitations: may not reflect relationship quality
+  Best for: large-scale communication networks
+Observation:
+  Systematic recording of interactions
+  Advantages: captures actual behavior
+  Limitations: time-intensive, observer effects
+  Best for: small groups, classroom networks
+Digital trace data:
+  Social media follows, retweets, mentions
+  Advantages: large-scale, timestamped
+  Limitations: platform-specific behavior, not generalizable
+  Best for: online community studies
+Important considerations:
+  - Boundary specification: who is included in the network?
+  - Complete vs sampled networks require different methods
+  - IRB/ethics approval needed for human subjects research
+  - Node anonymization required for publication
+```
+## Core Network Metrics
+### Node-Level Centrality
+```python
+import networkx as nx
+def compute_centrality_measures(G):
+    """
+    Compute the four classic centrality measures for all nodes.
+    Each captures a different dimension of node importance:
+    - Degree: connectivity (popular nodes)
+    - Betweenness: brokerage (bridge nodes)
+    - Closeness: reachability (efficient nodes)
+    - Eigenvector: prestige (connected to important nodes)
+    """
+    centralities = {}
+    # Degree centrality: proportion of nodes connected to
+    centralities["degree"] = nx.degree_centrality(G)
+    # Betweenness: proportion of shortest paths through node
+    centralities["betweenness"] = nx.betweenness_centrality(
+        G, weight="weight", normalized=True
+    )
+    # Closeness: inverse of average shortest path to all others
+    centralities["closeness"] = nx.closeness_centrality(G)
+    # Eigenvector: connected to other high-centrality nodes
+    try:
+        centralities["eigenvector"] = nx.eigenvector_centrality(
+            G, max_iter=1000, weight="weight"
+        )
+    except nx.PowerIterationFailedConvergence:
+        centralities["eigenvector"] = {}
+    return centralities
+```
+### Network-Level Metrics
+```python
+def compute_network_metrics(G):
+    """
+    Compute network-level structural properties.
+    """
+    metrics = {}
+    n = G.number_of_nodes()
+    m = G.number_of_edges()
+    metrics["nodes"] = n
+    metrics["edges"] = m
+    # Density: actual edges / possible edges
+    metrics["density"] = nx.density(G)
+    # Average clustering coefficient: transitivity tendency
+    metrics["avg_clustering"] = nx.average_clustering(G)
+    # Global clustering (transitivity)
+    metrics["transitivity"] = nx.transitivity(G)
+    # Connected components
+    if G.is_directed():
+        metrics["weakly_connected_components"] = (
+            nx.number_weakly_connected_components(G)
+        )
+    else:
+        metrics["connected_components"] = (
+            nx.number_connected_components(G)
+        )
+        if nx.is_connected(G):
+            metrics["diameter"] = nx.diameter(G)
+            metrics["avg_shortest_path"] = (
+                nx.average_shortest_path_length(G)
+            )
+    # Degree distribution statistics
+    degrees = [d for n, d in G.degree()]
+    metrics["avg_degree"] = sum(degrees) / len(degrees)
+    metrics["max_degree"] = max(degrees)
+    return metrics
+def interpret_metrics(metrics):
+    """
+    Provide interpretive context for network metrics.
+    """
+    interpretations = []
+    if metrics["density"] > 0.5:
+        interpretations.append(
+            "High density: most actors are connected. "
+            "Information spreads quickly but network is "
+            "resource-intensive to maintain."
+        )
+    elif metrics["density"] < 0.1:
+        interpretations.append(
+            "Low density: sparse connections. Network "
+            "may have structural holes and brokerage "
+            "opportunities."
+        )
+    if metrics["avg_clustering"] > 0.5:
+        interpretations.append(
+            "High clustering: strong tendency to form "
+            "closed triads. Indicates group cohesion "
+            "and potential echo chambers."
+        )
+    return interpretations
+```
+## Community Detection
+### Algorithms for Finding Groups
+```python
+import community as community_louvain
+def detect_communities_multiple(G):
+    """
+    Apply multiple community detection algorithms and compare.
+    Different algorithms may reveal different structural patterns.
+    """
+    results = {}
+    # Louvain method (modularity optimization)
+    results["louvain"] = community_louvain.best_partition(
+        G, weight="weight"
+    )
+    results["louvain_modularity"] = (
+        community_louvain.modularity(results["louvain"], G)
+    )
+    # Label Propagation (fast, non-deterministic)
+    lp_communities = nx.community.label_propagation_communities(G)
+    lp_partition = {}
+    for i, comm in enumerate(lp_communities):
+        for node in comm:
+            lp_partition[node] = i
+    results["label_propagation"] = lp_partition
+    # Girvan-Newman (edge betweenness, slow but interpretable)
+    # Only practical for small networks (< 1000 nodes)
+    if G.number_of_nodes() < 500:
+        gn_communities = nx.community.girvan_newman(G)
+        top_level = next(gn_communities)
+        gn_partition = {}
+        for i, comm in enumerate(top_level):
+            for node in comm:
+                gn_partition[node] = i
+        results["girvan_newman"] = gn_partition
+    return results
+```
+## Ego Network Analysis
+### Analyzing Individual Networks
+```
+Ego network concepts:
+Ego: the focal actor
+Alters: ego's direct contacts
+Ties: connections between alters (not through ego)
+Key ego network measures:
+  - Size: number of alters
+  - Density: proportion of possible alter-alter ties that exist
+  - Constraint: Burt's measure of structural holes
+    - Low constraint = access to diverse information
+    - High constraint = redundant contacts
+  - Effective size: size minus redundancy of contacts
+  - Ego betweenness: brokerage within the ego network
+Research applications:
+  - Social support and health outcomes
+  - Innovation diffusion and adoption
+  - Career success and social capital
+  - Information access and decision-making
+```
+## Visualization Best Practices
+### Layout and Design
+```
+Network visualization guidelines:
+Layout algorithms:
+  - Force-directed (Fruchterman-Reingold, ForceAtlas2):
+    Best for: showing clusters, general structure
+    Use when: exploring data, presenting to general audience
+  - Circular: Best for: showing connectivity patterns
+    Use when: comparing density across groups
+  - Hierarchical (Sugiyama): Best for: directed acyclic graphs
+    Use when: showing flow or hierarchy
+Visual encoding:
+  - Node size: proportional to centrality or attribute value
+  - Node color: community membership or categorical attribute
+  - Edge width: relationship strength or frequency
+  - Edge color: relationship type (in multiplex networks)
+Publication standards:
+  - Use colorblind-friendly palettes
+  - Include a legend for all visual encodings
+  - Report the layout algorithm used
+  - State N (nodes) and M (edges) in the caption
+  - For large networks, consider filtering to top-k nodes
+  - Provide the network data in supplementary materials
+Tools:
+  - Gephi: interactive exploration, ForceAtlas2 layout
+  - Python pyvis: interactive HTML visualizations
+  - R igraph: publication-quality static figures
+  - Cytoscape: biological networks, rich plugin ecosystem
+```
+Social network analysis provides a structural perspective on social phenomena that complements traditional individual-level analyses. By examining patterns of relationships rather than attributes of individuals, SNA reveals how position in a social structure shapes behavior, information access, influence, and outcomes.