npm - ma-agents - Versions diffs - 3.2.0 → 3.4.0 - Mend

ma-agents 3.2.0 → 3.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (161) hide show

package/lib/bmad-extension/skills/ml-experiment/assets/template_gnn_module.py ADDED Viewed

@@ -0,0 +1,341 @@
+"""
+template_gnn_module.py — BMAD DL Lifecycle
+(Inspired by K-Dense claude-scientific-skills/pytorch-geometric/)
+PyTorch Geometric (PyG) template for Graph Neural Network tasks.
+Covers GCN, GAT, GraphSAGE, and GIN architectures for:
+  - Node classification  (predicting labels for each node in a graph)
+  - Graph classification (predicting a label for an entire graph)
+Use this for defect detection on circuit graphs, molecular property prediction,
+social network analysis, or any graph-structured data.
+Requires: pip install torch-geometric
+Usage:
+    Copy this file to src/models/your_gnn.py and:
+    1. Choose an architecture (GCN / GAT / GraphSAGE / GIN)
+    2. Set in_channels to your node feature dimension
+    3. Set out_channels to number of classes
+    4. Wrap with your LightningModule or use train/test helpers directly
+"""
+from __future__ import annotations
+from typing import Optional
+try:
+    import torch
+    import torch.nn as nn
+    import torch.nn.functional as F
+    HAS_TORCH = True
+except ImportError:
+    HAS_TORCH = False
+    raise ImportError("Install PyTorch: pip install torch")
+try:
+    from torch_geometric.nn import (
+        GCNConv, GATConv, SAGEConv, GINConv,
+        global_mean_pool, global_max_pool, global_add_pool,
+    )
+    from torch_geometric.data import Data, DataLoader
+    HAS_PYG = True
+except ImportError:
+    HAS_PYG = False
+    raise ImportError(
+        "Install PyTorch Geometric: pip install torch-geometric\n"
+        "See: https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html"
+    )
+# ══════════════════════════════════════════════════════════════════════════════
+# Architecture 1: GCN — Graph Convolutional Network
+# Best for: Homophilic graphs (connected nodes tend to share labels)
+# ══════════════════════════════════════════════════════════════════════════════
+class GCN(nn.Module):
+    """
+    Graph Convolutional Network (Kipf & Welling, 2017).
+    Args:
+        in_channels:   Dimension of input node features.
+        hidden_channels: Hidden layer dimension.
+        out_channels:  Number of output classes.
+        num_layers:    Number of GCN layers (2-4 recommended).
+        dropout:       Dropout rate.
+        task:          'node' or 'graph' classification.
+    """
+    def __init__(
+        self, in_channels: int, hidden_channels: int, out_channels: int,
+        num_layers: int = 3, dropout: float = 0.5, task: str = "node",
+    ):
+        super().__init__()
+        self.task = task
+        self.dropout = dropout
+        self.convs = nn.ModuleList()
+        self.bns = nn.ModuleList()
+        for i in range(num_layers):
+            in_ch = in_channels if i == 0 else hidden_channels
+            self.convs.append(GCNConv(in_ch, hidden_channels))
+            self.bns.append(nn.BatchNorm1d(hidden_channels))
+        self.classifier = nn.Linear(hidden_channels, out_channels)
+    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
+                batch: Optional[torch.Tensor] = None) -> torch.Tensor:
+        for conv, bn in zip(self.convs[:-1], self.bns[:-1]):
+            x = F.relu(bn(conv(x, edge_index)))
+            x = F.dropout(x, p=self.dropout, training=self.training)
+        x = self.convs[-1](x, edge_index)
+        if self.task == "graph":
+            x = global_mean_pool(x, batch)
+        return self.classifier(x)
+# ══════════════════════════════════════════════════════════════════════════════
+# Architecture 2: GAT — Graph Attention Network
+# Best for: Graphs where some neighbors are more important than others
+# ══════════════════════════════════════════════════════════════════════════════
+class GAT(nn.Module):
+    """
+    Graph Attention Network (Veličković et al., 2018).
+    Multi-head attention assigns different importance to each neighbor.
+    Args:
+        in_channels:     Dimension of input node features.
+        hidden_channels: Hidden layer dimension per head.
+        out_channels:    Number of output classes.
+        heads:           Number of attention heads (4-8 recommended).
+        dropout:         Dropout rate (applied to attention weights too).
+        task:            'node' or 'graph' classification.
+    """
+    def __init__(
+        self, in_channels: int, hidden_channels: int, out_channels: int,
+        heads: int = 4, dropout: float = 0.5, task: str = "node",
+    ):
+        super().__init__()
+        self.task = task
+        self.dropout = dropout
+        self.conv1 = GATConv(in_channels, hidden_channels, heads=heads, dropout=dropout)
+        self.conv2 = GATConv(hidden_channels * heads, out_channels, heads=1,
+                             concat=False, dropout=dropout)
+        self.bn1 = nn.BatchNorm1d(hidden_channels * heads)
+    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
+                batch: Optional[torch.Tensor] = None) -> torch.Tensor:
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = F.elu(self.bn1(self.conv1(x, edge_index)))
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = self.conv2(x, edge_index)
+        if self.task == "graph":
+            x = global_mean_pool(x, batch)
+        return x  # Raw logits — apply softmax/sigmoid in loss
+# ══════════════════════════════════════════════════════════════════════════════
+# Architecture 3: GraphSAGE — Inductive / large-graph friendly
+# Best for: Large graphs, inductive settings (unseen nodes at test time)
+# ══════════════════════════════════════════════════════════════════════════════
+class GraphSAGE(nn.Module):
+    """
+    GraphSAGE (Hamilton et al., 2017).
+    Aggregates neighbor features via mean/max/LSTM — scales to large graphs.
+    Inductive: can generalize to unseen nodes not in the training graph.
+    Args:
+        in_channels:     Dimension of input node features.
+        hidden_channels: Hidden layer dimension.
+        out_channels:    Number of output classes.
+        num_layers:      Number of SAGE layers.
+        dropout:         Dropout rate.
+        task:            'node' or 'graph' classification.
+    """
+    def __init__(
+        self, in_channels: int, hidden_channels: int, out_channels: int,
+        num_layers: int = 3, dropout: float = 0.5, task: str = "node",
+    ):
+        super().__init__()
+        self.task = task
+        self.dropout = dropout
+        self.convs = nn.ModuleList()
+        self.bns = nn.ModuleList()
+        for i in range(num_layers):
+            in_ch = in_channels if i == 0 else hidden_channels
+            self.convs.append(SAGEConv(in_ch, hidden_channels))
+            self.bns.append(nn.BatchNorm1d(hidden_channels))
+        self.classifier = nn.Linear(hidden_channels, out_channels)
+    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
+                batch: Optional[torch.Tensor] = None) -> torch.Tensor:
+        for conv, bn in zip(self.convs, self.bns):
+            x = F.relu(bn(conv(x, edge_index)))
+            x = F.dropout(x, p=self.dropout, training=self.training)
+        if self.task == "graph":
+            x = global_mean_pool(x, batch)
+        return self.classifier(x)
+# ══════════════════════════════════════════════════════════════════════════════
+# Architecture 4: GIN — Graph Isomorphism Network
+# Best for: Graph classification, maximally expressive (Weisfeiler-Leman equiv.)
+# ══════════════════════════════════════════════════════════════════════════════
+class GIN(nn.Module):
+    """
+    Graph Isomorphism Network (Xu et al., 2019).
+    Most expressive GNN for graph-level tasks in the WL hierarchy.
+    Aggregates by: h_v = MLP((1 + eps) * h_v + sum(neighbors))
+    Args:
+        in_channels:     Dimension of input node features.
+        hidden_channels: Hidden dimension for each MLP layer.
+        out_channels:    Number of output classes.
+        num_layers:      Number of GIN layers (3-5 for graph classification).
+        dropout:         Dropout rate.
+    """
+    def __init__(
+        self, in_channels: int, hidden_channels: int, out_channels: int,
+        num_layers: int = 4, dropout: float = 0.5,
+    ):
+        super().__init__()
+        self.dropout = dropout
+        self.convs = nn.ModuleList()
+        self.bns = nn.ModuleList()
+        for i in range(num_layers):
+            in_ch = in_channels if i == 0 else hidden_channels
+            mlp = nn.Sequential(
+                nn.Linear(in_ch, hidden_channels),
+                nn.BatchNorm1d(hidden_channels),
+                nn.ReLU(),
+                nn.Linear(hidden_channels, hidden_channels),
+            )
+            self.convs.append(GINConv(mlp, train_eps=True))
+            self.bns.append(nn.BatchNorm1d(hidden_channels))
+        # Jumping Knowledge: concat all layer outputs before classifier
+        self.classifier = nn.Sequential(
+            nn.Linear(hidden_channels * num_layers, hidden_channels),
+            nn.ReLU(),
+            nn.Dropout(dropout),
+            nn.Linear(hidden_channels, out_channels),
+        )
+    def forward(self, x: torch.Tensor, edge_index: torch.Tensor,
+                batch: torch.Tensor) -> torch.Tensor:
+        layer_outputs: list[torch.Tensor] = []
+        for conv, bn in zip(self.convs, self.bns):
+            x = F.relu(bn(conv(x, edge_index)))
+            x = F.dropout(x, p=self.dropout, training=self.training)
+            # Global pooling at each layer (Jumping Knowledge)
+            layer_outputs.append(global_add_pool(x, batch))
+        # Concatenate all layers' pooled outputs
+        x = torch.cat(layer_outputs, dim=1)
+        return self.classifier(x)
+# ══════════════════════════════════════════════════════════════════════════════
+# Training helpers
+# ══════════════════════════════════════════════════════════════════════════════
+def train_epoch(model: nn.Module, loader: "DataLoader",
+                optimizer: "torch.optim.Optimizer",
+                criterion: nn.Module, device: str) -> float:
+    model.train()
+    total_loss = 0.0
+    for data in loader:
+        data = data.to(device)
+        optimizer.zero_grad()
+        if model.__class__.__name__ == "GIN" or getattr(model, "task", "") == "graph":
+            out = model(data.x, data.edge_index, data.batch)
+        else:
+            out = model(data.x, data.edge_index)
+            if hasattr(data, "train_mask"):
+                out = out[data.train_mask]
+                target = data.y[data.train_mask]
+            else:
+                target = data.y
+            loss = criterion(out, target)
+            loss.backward()
+            optimizer.step()
+            total_loss += float(loss)
+            continue
+        loss = criterion(out, data.y)
+        loss.backward()
+        optimizer.step()
+        total_loss += float(loss)
+    return total_loss / len(loader)
+@torch.no_grad()
+def evaluate(model: nn.Module, loader: "DataLoader", device: str) -> float:
+    model.eval()
+    correct = total = 0
+    for data in loader:
+        data = data.to(device)
+        if model.__class__.__name__ == "GIN" or getattr(model, "task", "") == "graph":
+            out = model(data.x, data.edge_index, data.batch)
+            pred = out.argmax(dim=1)
+            correct += int((pred == data.y).sum())
+            total += data.y.size(0)
+        else:
+            out = model(data.x, data.edge_index)
+            if hasattr(data, "test_mask"):
+                pred = out[data.test_mask].argmax(dim=1)
+                correct += int((pred == data.y[data.test_mask]).sum())
+                total += int(data.test_mask.sum())
+            else:
+                pred = out.argmax(dim=1)
+                correct += int((pred == data.y).sum())
+                total += data.y.size(0)
+    return correct / total if total > 0 else 0.0
+# ══════════════════════════════════════════════════════════════════════════════
+# Architecture selection guide
+# ══════════════════════════════════════════════════════════════════════════════
+ARCHITECTURE_GUIDE = """
+GNN Architecture Selection Guide — BMAD DL Lifecycle
+─────────────────────────────────────────────────────
+Task: Node classification
+  Homophilic graph (similar nodes connected)  → GCN
+  Attention needed (noisy neighbors)          → GAT
+  Large / dynamic / inductive graph           → GraphSAGE
+Task: Graph classification
+  Standard accuracy                           → GraphSAGE or GCN + global pool
+  Maximum expressiveness                      → GIN (recommended)
+  Edge features matter                        → GAT with edge_attr
+Quick model size guide:
+  Small dataset  (<1K graphs)  → 2 layers, hidden=64
+  Medium dataset (1K–50K)      → 3 layers, hidden=128
+  Large dataset  (50K+)        → 4-5 layers, hidden=256, mini-batch DataLoader
+Typical hyperparameter ranges:
+  hidden_channels: 64, 128, 256
+  num_layers:      2, 3, 4
+  dropout:         0.3 – 0.6
+  heads (GAT):     4, 8
+  learning_rate:   0.001 – 0.01
+"""
+if __name__ == "__main__":
+    print(ARCHITECTURE_GUIDE)

package/lib/bmad-extension/skills/ml-experiment/assets/template_lightning_module.py ADDED Viewed

@@ -0,0 +1,158 @@
+"""
+template_lightning_module.py — BMAD DL Lifecycle
+PyTorch Lightning LightningModule template for supervised classification/regression.
+Drop this file into your project and fill in the TODO sections.
+Usage:
+    Copy to src/models/your_model.py and implement:
+      - __init__: define layers
+      - forward: define forward pass
+      - _shared_step: compute loss + metrics for any split
+    The training/validation/test steps call _shared_step automatically.
+"""
+from __future__ import annotations
+from typing import Any
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+try:
+    import lightning as L
+    LightningModule = L.LightningModule
+except ImportError:
+    try:
+        import pytorch_lightning as pl
+        LightningModule = pl.LightningModule
+    except ImportError:
+        raise ImportError(
+            "Install PyTorch Lightning: pip install lightning\n"
+            "  or: pip install pytorch-lightning"
+        )
+class YourModel(LightningModule):
+    """
+    Template LightningModule for image/tabular classification or regression.
+    Replace 'YourModel' with a descriptive name (e.g. DefectClassifier, FruitNet).
+    Args:
+        num_classes: Number of output classes (use 1 for binary/regression).
+        learning_rate: Initial learning rate for the optimizer.
+        weight_decay: L2 regularization weight.
+    """
+    def __init__(
+        self,
+        num_classes: int = 2,
+        learning_rate: float = 1e-3,
+        weight_decay: float = 1e-4,
+    ):
+        super().__init__()
+        # Saves all __init__ args to self.hparams (enables checkpointing)
+        self.save_hyperparameters()
+        # ── TODO: Define your model architecture ──────────────────────────────
+        # Example: simple two-layer MLP for tabular data
+        self.encoder = nn.Sequential(
+            nn.Linear(128, 64),   # TODO: replace 128 with your input dim
+            nn.BatchNorm1d(64),
+            nn.ReLU(),
+            nn.Dropout(0.3),
+        )
+        self.classifier = nn.Linear(64, num_classes)
+        # ── END TODO ──────────────────────────────────────────────────────────
+        # Loss function
+        if num_classes == 1:
+            self.loss_fn = nn.BCEWithLogitsLoss()
+        else:
+            self.loss_fn = nn.CrossEntropyLoss()
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        Forward pass.
+        Args:
+            x: Input tensor, shape depends on your architecture.
+        Returns:
+            Logits tensor of shape (batch, num_classes) or (batch, 1).
+        """
+        # ── TODO: Implement forward pass ──────────────────────────────────────
+        features = self.encoder(x)
+        logits = self.classifier(features)
+        return logits
+        # ── END TODO ──────────────────────────────────────────────────────────
+    def _shared_step(self, batch: Any, stage: str) -> torch.Tensor:
+        """
+        Common logic for train/val/test.
+        Args:
+            batch: Tuple of (inputs, labels) from your DataLoader.
+            stage: One of "train", "val", "test".
+        Returns:
+            Loss tensor.
+        """
+        # ── TODO: Unpack batch to match your DataLoader output ────────────────
+        x, y = batch  # e.g. (images, labels) or (features, targets)
+        # ── END TODO ──────────────────────────────────────────────────────────
+        logits = self(x)
+        if self.hparams.num_classes == 1:
+            loss = self.loss_fn(logits.squeeze(1), y.float())
+            preds = (torch.sigmoid(logits.squeeze(1)) > 0.5).long()
+        else:
+            loss = self.loss_fn(logits, y.long())
+            preds = logits.argmax(dim=1)
+        acc = (preds == y).float().mean()
+        # Log metrics — appears in TensorBoard / W&B / CSV logger
+        self.log(f"{stage}/loss", loss, prog_bar=(stage == "val"), on_step=False, on_epoch=True)
+        self.log(f"{stage}/acc", acc, prog_bar=True, on_step=False, on_epoch=True)
+        return loss
+    # ── Lightning hooks (do not rename these) ─────────────────────────────────
+    def training_step(self, batch: Any, batch_idx: int) -> torch.Tensor:
+        loss = self._shared_step(batch, "train")
+        if torch.cuda.is_available():
+            self.log("gpu/memory_allocated_gb", torch.cuda.memory_allocated() / 1e9,
+                     on_step=True, on_epoch=False, prog_bar=False)
+        return loss
+    def validation_step(self, batch: Any, batch_idx: int) -> None:
+        self._shared_step(batch, "val")
+    def test_step(self, batch: Any, batch_idx: int) -> None:
+        self._shared_step(batch, "test")
+    def configure_optimizers(self) -> dict:
+        """
+        Set up optimizer and learning rate scheduler.
+        Swap optimizer or scheduler as needed.
+        """
+        optimizer = torch.optim.AdamW(
+            self.parameters(),
+            lr=self.hparams.learning_rate,
+            weight_decay=self.hparams.weight_decay,
+        )
+        scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
+            optimizer,
+            T_max=10,   # TODO: set to your total_epochs
+            eta_min=1e-6,
+        )
+        return {
+            "optimizer": optimizer,
+            "lr_scheduler": {
+                "scheduler": scheduler,
+                "monitor": "val/loss",
+            },
+        }

package/lib/bmad-extension/skills/ml-experiment/bmad-skill-manifest.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+type: skill
+name: ml-experiment
+module: ma-skills

package/lib/bmad-extension/skills/ml-experiment/skill.json ADDED Viewed

@@ -0,0 +1,7 @@
+{
+  "name": "ML Experiment Execution",
+  "description": "Executes training runs against a locked TECHSPEC and logs metrics to W&B/MLflow/ClearML.",
+  "version": "1.0.0",
+  "author": "Demerzel (ML Scientist)",
+  "tags": ["Machine Learning", "Experiment", "Training", "Logging", "Demerzel"]
+}

package/lib/bmad-extension/skills/ml-hparam/.gitkeep ADDED Viewed

File without changes

package/lib/bmad-extension/skills/ml-hparam/SKILL.md ADDED Viewed

@@ -0,0 +1,81 @@
+---
+name: ml-hparam
+description: Acts as Demerzel (Machine Learning Scientist) to run structured hyperparameter optimization after a baseline architecture is confirmed to work. Uses Optuna, W&B Sweeps, or Ray Tune. Produces a validated best-parameter configuration for the next tuned experiment run.
+---
+# Machine Learning Workflow: Hyperparameter Optimization (Conditional) — Demerzel
+## 1. Operating Instructions
+You are **Demerzel**, an expert Machine Learning Scientist running structured hyperparameter search. **This stage is conditional.** Run it only after `ml-analysis` confirms the baseline architecture meets at least the "Worst case (alive)" tier in the TECHSPEC.
+Your goal is to find the optimal parameter configuration within the search space defined in the TECHSPEC, then hand off validated parameters to the next `ml-experiment` run.
+1. **Verify the prerequisite:** Read the latest `ml-analysis-exp-[id].md`. Confirm: "Worst case (alive)" tier or better was reached. If not, recommend running `ml-revision` instead.
+2. **Read the TECHSPEC:** `_bmad-output/planning-artifacts/techspecs/ml-techspec-exp-[id].md` — use Section C as the HPO search space.
+3. **Run the advisor:** `/ml-advise` — check if any past HPO runs exist.
+4. **Run the HPO search:** (Optuna, W&B Sweeps, or Ray Tune).
+   - Define objective: Maximize the primary metric from the PRD.
+   - Use early stopping to save budget.
+   - Example (Optuna):
+   ```python
+   import optuna
+   def objective(trial):
+       lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)
+       # train and return f1
+       return f1
+   study = optuna.create_study(direction="maximize")
+   study.optimize(objective, n_trials=50)
+   ```
+5. **CRITICAL:** Do not write the HPO report yet. Present the top-5 parameter configurations to the user and ask for sign-off. Halt and wait.
+6. Upon confirmation, write `_bmad-output/planning-artifacts/techspecs/ml-hparam-exp-[id].md` with the validated best configuration and update the original TECHSPEC.
+7. **Commit the HPO artifact:**
+   ```bash
+   git add _bmad-output/planning-artifacts/techspecs/ml-hparam-exp-[id].md
+   git commit -m "docs(ml-hparam): validated best config for EXP-[id] val/f1=[score]"
+   ```
+## 2. Expected Output Template
+### Template A: `_bmad-output/planning-artifacts/techspecs/ml-hparam-exp-[id].md`
+```markdown
+# HPO Results: EXP-[ID]
+## A. Search Summary
+* **Linked Experiment:** EXP-[ID]
+* **HPO Tool:** [Optuna / W&B Sweeps / Ray Tune]
+* **Sweep URL:** [link]
+## B. Top Configurations
+| Rank | lr | batch_size | dropout | val/f1 | Run URL |
+| :--- | :--- | :--- | :--- | :--- | :--- |
+| 1 (best) | 2.3e-4 | 1024 | 0.22 | 0.94 | [link] |
+## C. Parameter Importance
+* [Which params had highest impact from study analysis.]
+## D. Validated Best Configuration (copy-paste ready)
+```yaml
+learning_rate: 2.3e-4
+batch_size: 1024
+dropout: 0.22
+```
+## E. Scientist Sign-off
+* [ ] Best params are within real-world acceptable ranges.
+* [ ] No anomalous values.
+* **Signed off by:** [Demerzel / Date]
+## F. Next Step
+* Run `/ml-experiment` with run_type="tuned" using the above config.
+```

package/lib/bmad-extension/skills/ml-hparam/bmad-skill-manifest.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+type: skill
+name: ml-hparam
+module: ma-skills

package/lib/bmad-extension/skills/ml-hparam/skill.json ADDED Viewed

@@ -0,0 +1,7 @@
+{
+  "name": "ML Hyperparameter Optimization",
+  "description": "Runs structured HPO using Optuna, W&B Sweeps, or Ray Tune after a baseline is confirmed.",
+  "version": "1.0.0",
+  "author": "Demerzel (ML Scientist)",
+  "tags": ["Machine Learning", "HPO", "Optimization", "Optuna", "Sweeps", "Demerzel"]
+}

package/lib/bmad-extension/skills/ml-ideation/.gitkeep ADDED Viewed

File without changes

package/lib/bmad-extension/skills/ml-ideation/SKILL.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+name: ml-ideation
+description: ML Ideation — Frame the research problem, define success criteria, and produce a Machine Learning PRD
+---
+# ML Stage 1 — Ideation & PRD
+Frame the ML problem rigorously before any data or modelling work begins.
+## Instructions
+### 1. Elicit Problem Context
+Ask the user for (or extract from context):
+- **Business problem**: What decision or process will the model improve?
+- **Target variable**: What are we predicting? (classification / regression / ranking / generation)
+- **Failure cost asymmetry**: What is the cost of a false negative vs a false positive in this domain?
+- **Success definition**: What metric threshold constitutes a production-ready model?
+- **Data availability**: What raw datasets exist and where are they located?
+### 2. Produce Research Thesis
+Write `_bmad-output/planning-artifacts/research-thesis.md` with:
+- **Hypothesis**: A single falsifiable statement (e.g. "We can predict X with >Y recall using features A, B, C")
+- **Assumptions**: List all assumptions that must hold for the hypothesis to be testable
+- **Risks**: Top 3 risks that could invalidate the hypothesis (data quality, label noise, distribution shift)
+- **Null Hypothesis**: The baseline we must beat (random, heuristic, or existing system)
+### 3. Produce ML PRD
+Write `_bmad-output/planning-artifacts/ml-prd.md` with sections:
+- **Problem Statement** (1 paragraph)
+- **Stakeholders & Users**
+- **Success Metrics** (primary metric, secondary metrics, guardrail metrics)
+- **Failure Cost Matrix** (FP cost vs FN cost with domain justification)
+- **Data Requirements** (source, volume, freshness, labelling)
+- **Out of Scope** (explicit non-goals)
+- **Dependencies** (upstream data pipelines, external APIs)
+### 4. Surface Dilemmas & Commit Gate
+Before presenting and **before any git commit**:
+- Identify every framing choice where two or more reasonable options existed (metric threshold, failure cost ratio, scope boundary, two-stage vs single-stage, etc.)
+- Format each as: **Dilemma [Letter] — Title** / **Context** / **Options (a/b)** / **Recommendation** / **Your decision:** [blank]
+- If all choices were unambiguous, state explicitly: "No open dilemmas."
+- **Do NOT commit any artifact until the user has responded and given explicit approval.**
+### 5. Confirm & Advance
+- Present both documents to the user for review
+- Ask: "Do you approve this framing, or would you like to adjust the hypothesis or success criteria?"
+- On approval: commit artifacts, then say "Stage 1 complete. When ready, proceed to **Stage 2 — /ml-eda** to analyze the raw data."
+- STOP and WAIT for user confirmation before advancing

package/lib/bmad-extension/skills/ml-ideation/bmad-skill-manifest.yaml ADDED Viewed

@@ -0,0 +1,3 @@
+type: skill
+name: ml-ideation
+module: ma-skills