npm - musubi-sdd - Versions diffs - 3.0.0 → 3.0.1 - Mend

musubi-sdd 3.0.0 → 3.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

package/bin/musubi-browser.js CHANGED Viewed

File without changes

package/bin/musubi-convert.js CHANGED Viewed

File without changes

package/bin/musubi-gui.js CHANGED Viewed

File without changes

package/bin/musubi-validate.js CHANGED Viewed

@@ -184,16 +184,6 @@ program
     }
   });
-// All validations (duplicate removed, keeping original)
-      displayResults('Complexity Validation', results, options);
-      process.exit(results.passed ? 0 : 1);
-    } catch (error) {
-      console.error(chalk.red('✗ Validation error:'), error.message);
-      process.exit(1);
-    }
-  });
 // All validations
 program
   .command('all')

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "musubi-sdd",
-  "version": "3.0.0",
+  "version": "3.0.1",
   "description": "Ultimate Specification Driven Development Tool with 27 Agents for 7 AI Coding Platforms + MCP Integration (Claude Code, GitHub Copilot, Cursor, Gemini CLI, Windsurf, Codex, Qwen Code)",
   "main": "src/index.js",
   "bin": {

package/src/templates/agents/claude-code/skills/ai-ml-engineer/mlops-guide.md ADDED Viewed

@@ -0,0 +1,350 @@
+# MLOps Guide
+## Overview
+Best practices for Machine Learning Operations (MLOps) in production systems.
+---
+## MLOps Lifecycle
+```
+┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
+│  Data    │───▶│  Train   │───▶│  Deploy  │───▶│ Monitor  │
+└──────────┘    └──────────┘    └──────────┘    └──────────┘
+     │               │               │               │
+     └───────────────┴───────────────┴───────────────┘
+                    Continuous Loop
+```
+---
+## 1. Data Management
+### Data Versioning
+```bash
+# Using DVC (Data Version Control)
+dvc init
+dvc add data/training_data.csv
+git add data/training_data.csv.dvc .gitignore
+git commit -m "Add training data v1"
+```
+### Data Pipeline
+```python
+# data_pipeline.py
+from prefect import flow, task
+@task
+def extract_data(source: str) -> pd.DataFrame:
+    return pd.read_csv(source)
+@task
+def transform_data(df: pd.DataFrame) -> pd.DataFrame:
+    # Feature engineering
+    df['feature_1'] = df['col_a'] * df['col_b']
+    return df
+@task
+def validate_data(df: pd.DataFrame) -> bool:
+    # Data quality checks
+    assert df['feature_1'].isnull().sum() == 0
+    return True
+@flow
+def data_pipeline(source: str):
+    df = extract_data(source)
+    df = transform_data(df)
+    validate_data(df)
+    return df
+```
+---
+## 2. Experiment Tracking
+### MLflow Setup
+```python
+import mlflow
+from mlflow.tracking import MlflowClient
+# Set tracking URI
+mlflow.set_tracking_uri("http://mlflow-server:5000")
+mlflow.set_experiment("my-experiment")
+# Log experiment
+with mlflow.start_run():
+    # Log parameters
+    mlflow.log_param("learning_rate", 0.01)
+    mlflow.log_param("epochs", 100)
+    # Train model
+    model = train_model(X_train, y_train)
+    # Log metrics
+    mlflow.log_metric("accuracy", accuracy)
+    mlflow.log_metric("f1_score", f1)
+    # Log model
+    mlflow.sklearn.log_model(model, "model")
+    # Log artifacts
+    mlflow.log_artifact("feature_importance.png")
+```
+---
+## 3. Model Registry
+### Model Versioning
+```python
+from mlflow.tracking import MlflowClient
+client = MlflowClient()
+# Register model
+model_uri = f"runs:/{run_id}/model"
+mv = client.create_model_version(
+    name="my-model",
+    source=model_uri,
+    run_id=run_id
+)
+# Transition to staging
+client.transition_model_version_stage(
+    name="my-model",
+    version=mv.version,
+    stage="Staging"
+)
+# Promote to production
+client.transition_model_version_stage(
+    name="my-model",
+    version=mv.version,
+    stage="Production"
+)
+```
+### Model Metadata
+```yaml
+# model_metadata.yaml
+model:
+  name: fraud-detector
+  version: 2.1.0
+  framework: scikit-learn
+training:
+  date: 2024-01-15
+  dataset_version: v1.2
+  metrics:
+    accuracy: 0.95
+    f1_score: 0.92
+requirements:
+  - scikit-learn==1.3.0
+  - pandas==2.0.0
+schema:
+  input:
+    - name: amount
+      type: float
+    - name: category
+      type: string
+  output:
+    - name: is_fraud
+      type: boolean
+    - name: confidence
+      type: float
+```
+---
+## 4. Model Deployment
+### Serving with FastAPI
+```python
+from fastapi import FastAPI
+import mlflow
+app = FastAPI()
+# Load model on startup
+model = mlflow.sklearn.load_model("models:/my-model/Production")
+@app.post("/predict")
+async def predict(features: dict):
+    df = pd.DataFrame([features])
+    prediction = model.predict(df)
+    probability = model.predict_proba(df)
+    return {
+        "prediction": int(prediction[0]),
+        "confidence": float(probability[0].max())
+    }
+@app.get("/health")
+async def health():
+    return {"status": "healthy", "model_version": "2.1.0"}
+```
+### Deployment Strategies
+| Strategy | Description | Use Case |
+|----------|-------------|----------|
+| Shadow | Run parallel to existing | Validate new model |
+| Canary | Gradual traffic shift | Safe rollout |
+| Blue-Green | Full switch | Quick rollback |
+| A/B Test | Split traffic | Compare models |
+---
+## 5. Monitoring
+### Prediction Logging
+```python
+import logging
+from datetime import datetime
+def log_prediction(request, response, latency_ms):
+    logging.info({
+        "timestamp": datetime.utcnow().isoformat(),
+        "request_id": request.id,
+        "features": request.features,
+        "prediction": response.prediction,
+        "confidence": response.confidence,
+        "latency_ms": latency_ms,
+        "model_version": "2.1.0"
+    })
+```
+### Data Drift Detection
+```python
+from scipy import stats
+def detect_drift(reference_data, current_data, threshold=0.05):
+    """Detect distribution drift using KS test."""
+    drifted_features = []
+    for column in reference_data.columns:
+        statistic, p_value = stats.ks_2samp(
+            reference_data[column],
+            current_data[column]
+        )
+        if p_value < threshold:
+            drifted_features.append({
+                "feature": column,
+                "p_value": p_value,
+                "statistic": statistic
+            })
+    return drifted_features
+```
+### Performance Metrics
+```python
+# Prometheus metrics for ML
+from prometheus_client import Counter, Histogram, Gauge
+prediction_counter = Counter(
+    'model_predictions_total',
+    'Total predictions',
+    ['model_version', 'prediction_class']
+)
+prediction_latency = Histogram(
+    'model_prediction_latency_seconds',
+    'Prediction latency',
+    ['model_version']
+)
+model_accuracy = Gauge(
+    'model_accuracy',
+    'Current model accuracy',
+    ['model_version']
+)
+```
+---
+## 6. CI/CD for ML
+### GitHub Actions Pipeline
+```yaml
+# .github/workflows/ml-pipeline.yml
+name: ML Pipeline
+on:
+  push:
+    paths:
+      - 'models/**'
+      - 'data/**'
+jobs:
+  train:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Setup Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.10'
+      - name: Install dependencies
+        run: pip install -r requirements.txt
+      - name: Pull data
+        run: dvc pull
+      - name: Train model
+        run: python train.py
+      - name: Evaluate model
+        run: python evaluate.py
+      - name: Register model
+        if: github.ref == 'refs/heads/main'
+        run: python register_model.py
+```
+---
+## 7. Best Practices
+### Reproducibility Checklist
+- [ ] Code versioned in Git
+- [ ] Data versioned with DVC
+- [ ] Dependencies pinned (requirements.txt)
+- [ ] Random seeds set
+- [ ] Experiments logged in MLflow
+- [ ] Model artifacts stored
+### Model Validation Checklist
+- [ ] Performance metrics acceptable
+- [ ] No data leakage
+- [ ] Fairness metrics checked
+- [ ] Edge cases tested
+- [ ] Latency requirements met
+- [ ] Memory usage acceptable
+### Production Checklist
+- [ ] Model card documented
+- [ ] API versioned
+- [ ] Health checks implemented
+- [ ] Monitoring in place
+- [ ] Rollback procedure defined
+- [ ] A/B test framework ready

package/src/templates/agents/claude-code/skills/ai-ml-engineer/model-card-template.md ADDED Viewed

@@ -0,0 +1,246 @@
+# Model Card Template
+## Overview
+Template for documenting machine learning models following best practices.
+---
+## Model Card Document
+```markdown
+# Model Card: [Model Name]
+## Model Details
+### Basic Information
+| Field | Value |
+|-------|-------|
+| Model Name | [Name] |
+| Version | [X.Y.Z] |
+| Type | [Classification/Regression/etc.] |
+| Framework | [TensorFlow/PyTorch/scikit-learn] |
+| Date | YYYY-MM-DD |
+| Authors | [Team/Names] |
+### Description
+[Brief description of what the model does]
+### Intended Use
+- **Primary Use Cases**: [What it's designed for]
+- **Intended Users**: [Who should use it]
+- **Out-of-Scope Uses**: [What it shouldn't be used for]
+---
+## Model Architecture
+### Overview
+[Description of model architecture]
+### Inputs
+| Name | Type | Shape | Description |
+|------|------|-------|-------------|
+| feature_1 | float | (1,) | Transaction amount |
+| feature_2 | int | (1,) | Category code |
+### Outputs
+| Name | Type | Shape | Description |
+|------|------|-------|-------------|
+| prediction | int | (1,) | Class label |
+| probability | float | (n_classes,) | Class probabilities |
+### Hyperparameters
+| Parameter | Value |
+|-----------|-------|
+| learning_rate | 0.001 |
+| batch_size | 32 |
+| epochs | 100 |
+---
+## Training Data
+### Dataset Description
+| Field | Value |
+|-------|-------|
+| Name | [Dataset name] |
+| Version | [Version] |
+| Size | [N samples] |
+| Date Range | [Start] to [End] |
+### Data Distribution
+| Feature | Distribution |
+|---------|--------------|
+| Class 0 | 85% |
+| Class 1 | 15% |
+### Preprocessing
+- [Step 1]: [Description]
+- [Step 2]: [Description]
+### Data Splits
+| Split | Size | Purpose |
+|-------|------|---------|
+| Train | 70% | Model training |
+| Validation | 15% | Hyperparameter tuning |
+| Test | 15% | Final evaluation |
+---
+## Evaluation
+### Metrics
+| Metric | Value | Threshold |
+|--------|-------|-----------|
+| Accuracy | 0.95 | > 0.90 |
+| Precision | 0.92 | > 0.85 |
+| Recall | 0.88 | > 0.80 |
+| F1 Score | 0.90 | > 0.85 |
+| AUC-ROC | 0.97 | > 0.90 |
+### Confusion Matrix
+```
+              Predicted
+              0     1
+Actual  0   850    50
+        1    30   120
+```
+### Performance by Subgroup
+| Subgroup | Accuracy | Size |
+|----------|----------|------|
+| Group A | 0.96 | 400 |
+| Group B | 0.94 | 350 |
+| Group C | 0.93 | 250 |
+---
+## Fairness & Bias
+### Evaluation
+| Metric | Group A | Group B | Threshold |
+|--------|---------|---------|-----------|
+| TPR | 0.89 | 0.87 | Δ < 0.05 ✅ |
+| FPR | 0.08 | 0.09 | Δ < 0.05 ✅ |
+| PPV | 0.91 | 0.89 | Δ < 0.05 ✅ |
+### Mitigation Steps
+- [Step taken to address bias]
+### Known Limitations
+- [Limitation 1]
+- [Limitation 2]
+---
+## Ethical Considerations
+### Potential Risks
+- [Risk 1]: [Mitigation]
+- [Risk 2]: [Mitigation]
+### Use Cases to Avoid
+- [Should not be used for X]
+---
+## Deployment
+### Requirements
+```
+python>=3.9
+tensorflow==2.12.0
+numpy==1.24.0
+```
+### Resource Requirements
+| Resource | Minimum | Recommended |
+|----------|---------|-------------|
+| CPU | 2 cores | 4 cores |
+| Memory | 2 GB | 4 GB |
+| GPU | - | NVIDIA T4 |
+### Latency
+| Percentile | Latency |
+|------------|---------|
+| p50 | 15ms |
+| p95 | 45ms |
+| p99 | 80ms |
+### Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| /predict | POST | Get prediction |
+| /health | GET | Health check |
+---
+## Monitoring
+### Metrics to Track
+- Prediction distribution
+- Latency percentiles
+- Error rate
+- Data drift indicators
+### Alerting Thresholds
+| Metric | Warning | Critical |
+|--------|---------|----------|
+| Latency p99 | > 100ms | > 200ms |
+| Error Rate | > 1% | > 5% |
+| Drift Score | > 0.1 | > 0.2 |
+### Retraining Triggers
+- [Trigger 1]: [Condition]
+- [Trigger 2]: [Condition]
+---
+## Version History
+| Version | Date | Changes |
+|---------|------|---------|
+| 1.0.0 | YYYY-MM-DD | Initial release |
+| 1.1.0 | YYYY-MM-DD | Added feature X |
+| 2.0.0 | YYYY-MM-DD | Major architecture change |
+---
+## References
+- [Link to training code]
+- [Link to data documentation]
+- [Link to related papers]
+- [Link to API documentation]
+---
+## Contact
+For questions or issues:
+- Team: [Team name]
+- Email: [Contact email]
+- Slack: [Channel]
+```
+---
+## Quick Checklist
+### Before Release
+- [ ] Model architecture documented
+- [ ] Training data described
+- [ ] Evaluation metrics included
+- [ ] Fairness analysis completed
+- [ ] Ethical risks assessed
+- [ ] Deployment requirements listed
+- [ ] Monitoring plan defined
+- [ ] Version history updated
+### Review Questions
+1. Is the intended use clearly defined?
+2. Are limitations and risks documented?
+3. Can another team reproduce training?
+4. Are bias metrics acceptable?
+5. Is there a monitoring plan?