npm - maestro-bundle - Versions diffs - 1.3.1 → 1.4.0 - Mend

maestro-bundle 1.3.1 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (116) hide show

package/templates/bundle-data-pipeline/skills/mlops-pipeline/SKILL.md CHANGED Viewed

@@ -1,77 +1,196 @@
 ---
 name: mlops-pipeline
-description: Criar pipelines MLOps com MLflow para tracking, model registry e deployment automatizado. Use quando precisar versionar modelos, automatizar treino, ou configurar model registry.
+description: Build MLOps pipelines with MLflow for experiment tracking, model registry, and automated deployment. Use when you need to version models, track experiments, automate training pipelines, or configure a model registry.
+version: 1.0.0
+author: Maestro
 ---
 # MLOps Pipeline
-## MLflow Tracking
+Set up end-to-end MLOps workflows using MLflow for experiment tracking, model versioning, and automated training pipelines.
+## When to Use
+- User needs to track experiments (parameters, metrics, artifacts)
+- User wants to version and register models
+- User needs to compare runs and select the best model
+- User wants to automate a training pipeline with promotion logic
+- User needs to serve a model via MLflow
+## Available Operations
+1. Set up MLflow tracking server
+2. Log experiments (params, metrics, artifacts, models)
+3. Register models in the Model Registry
+4. Promote models through stages (Staging -> Production)
+5. Build an automated training pipeline with comparison logic
+6. Serve a model via MLflow REST API
+## Multi-Step Workflow
+### Step 1: Install Dependencies
+```bash
+pip install mlflow scikit-learn pandas boto3
+```
+### Step 2: Start MLflow Tracking Server (Local Development)
+```bash
+mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlflow-artifacts
+```
+For production, use a remote tracking URI:
+```bash
+export MLFLOW_TRACKING_URI=http://mlflow.your-domain.com
+```
+### Step 3: Create an Experiment and Log a Run
 ```python
 import mlflow
+from sklearn.ensemble import RandomForestClassifier
+from sklearn.metrics import accuracy_score, f1_score, precision_score
-mlflow.set_tracking_uri("http://mlflow.maestro.local")
-mlflow.set_experiment("compliance-classifier")
+mlflow.set_tracking_uri("http://localhost:5000")
+mlflow.set_experiment("my-classifier")
-with mlflow.start_run(run_name="rf-v1"):
-    mlflow.log_params({
-        "n_estimators": 200,
-        "max_depth": 20,
-        "cv_folds": 5
-    })
+with mlflow.start_run(run_name="rf-baseline"):
+    # Log parameters
+    params = {"n_estimators": 200, "max_depth": 20, "cv_folds": 5}
+    mlflow.log_params(params)
+    # Train model
+    model = RandomForestClassifier(**{k: v for k, v in params.items() if k != "cv_folds"}, random_state=42)
     model.fit(X_train, y_train)
     y_pred = model.predict(X_test)
-    mlflow.log_metrics({
+    # Log metrics
+    metrics = {
         "accuracy": accuracy_score(y_test, y_pred),
-        "f1": f1_score(y_test, y_pred, average='weighted'),
-        "precision": precision_score(y_test, y_pred, average='weighted'),
-    })
+        "f1": f1_score(y_test, y_pred, average="weighted"),
+        "precision": precision_score(y_test, y_pred, average="weighted"),
+    }
+    mlflow.log_metrics(metrics)
+    print(f"Metrics: {metrics}")
+    # Log model
     mlflow.sklearn.log_model(model, "model")
+    print(f"Run ID: {mlflow.active_run().info.run_id}")
+```
+### Step 4: Compare Runs
+```bash
+mlflow runs list --experiment-id 1 --order-by "metrics.f1 DESC"
 ```
-## Model Registry
+Or programmatically:
+```python
+import mlflow
+experiment = mlflow.get_experiment_by_name("my-classifier")
+runs = mlflow.search_runs(experiment_ids=[experiment.experiment_id], order_by=["metrics.f1 DESC"])
+print(runs[["run_id", "params.n_estimators", "metrics.f1", "metrics.accuracy"]].head(10))
+```
+### Step 5: Register the Best Model
 ```python
-# Registrar modelo
+run_id = runs.iloc[0]["run_id"]  # best run by F1
 model_uri = f"runs:/{run_id}/model"
-mlflow.register_model(model_uri, "compliance-classifier")
-# Promover para produção
+result = mlflow.register_model(model_uri, "my-classifier")
+print(f"Registered model version: {result.version}")
+```
+### Step 6: Promote to Production
+```python
 client = mlflow.MlflowClient()
+# Move to staging first
 client.transition_model_version_stage(
-    name="compliance-classifier",
-    version=2,
-    stage="Production"
+    name="my-classifier", version=result.version, stage="Staging"
 )
-```
+print(f"Model v{result.version} moved to Staging")
-## Pipeline automatizado
+# After validation, promote to production
+client.transition_model_version_stage(
+    name="my-classifier", version=result.version, stage="Production"
+)
+print(f"Model v{result.version} promoted to Production")
+```
+### Step 7: Build Automated Training Pipeline
 ```python
 # pipelines/training.py
-def training_pipeline():
-    """Pipeline completo: dados → treino → avaliação → registro"""
+import mlflow
+from sklearn.model_selection import train_test_split
-    # 1. Carregar dados
-    df = load_latest_data()
+def training_pipeline(data_path: str, experiment_name: str, model_name: str):
+    """End-to-end pipeline: load -> train -> evaluate -> register if better."""
+    mlflow.set_experiment(experiment_name)
-    # 2. Preprocessar
-    X, y = preprocess(df)
-    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+    # 1. Load and split
+    df = pd.read_parquet(data_path)
+    X = df.drop(columns=["target"])
+    y = df["target"]
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
-    # 3. Treinar com tracking
+    # 2. Train with tracking
     with mlflow.start_run():
         model = train_model(X_train, y_train)
         metrics = evaluate_model(model, X_test, y_test)
         mlflow.log_metrics(metrics)
+        mlflow.sklearn.log_model(model, "model")
+        # 3. Compare with current production
+        client = mlflow.MlflowClient()
+        try:
+            prod_versions = client.get_latest_versions(model_name, stages=["Production"])
+            prod_run = client.get_run(prod_versions[0].run_id)
+            prod_f1 = float(prod_run.data.metrics.get("f1", 0))
+        except Exception:
+            prod_f1 = 0.0
+        # 4. Register if better
+        if metrics["f1"] > prod_f1:
+            result = mlflow.register_model(f"runs:/{mlflow.active_run().info.run_id}/model", model_name)
+            client.transition_model_version_stage(name=model_name, version=result.version, stage="Staging")
+            print(f"New candidate v{result.version}: F1={metrics['f1']:.3f} > Production F1={prod_f1:.3f}")
+        else:
+            print(f"Model not better: F1={metrics['f1']:.3f} <= Production F1={prod_f1:.3f}")
+if __name__ == "__main__":
+    training_pipeline("data/processed/dataset.parquet", "my-classifier", "my-classifier")
+```
-        # 4. Registrar se melhor que produção
-        prod_metrics = get_production_metrics()
-        if metrics['f1'] > prod_metrics.get('f1', 0):
-            mlflow.sklearn.log_model(model, "model")
-            register_as_candidate(model)
-            notify_team("Novo modelo candidato disponível")
+### Step 8: Serve Model via REST API
+```bash
+mlflow models serve -m "models:/my-classifier/Production" --port 5001 --no-conda
 ```
+Test the endpoint:
+```bash
+curl -X POST http://localhost:5001/invocations -H "Content-Type: application/json" -d '{"inputs": [{"age": 30, "salary": 50000}]}'
+```
+## Resources
+- `references/mlflow-commands.md` - MLflow CLI and Python API quick reference
+## Examples
+### Example 1: Track a New Experiment
+User asks: "Set up experiment tracking for our fraud detection model"
+Response approach:
+1. Install mlflow and set tracking URI
+2. Create experiment with `mlflow.set_experiment("fraud-detection")`
+3. Log params, metrics, and model inside `mlflow.start_run()`
+4. Show how to view results in MLflow UI at http://localhost:5000
+### Example 2: Promote a Model
+User asks: "Our latest model passed validation, deploy it to production"
+Response approach:
+1. Find the latest Staging model version with `get_latest_versions`
+2. Transition to Production with `transition_model_version_stage`
+3. Verify with `mlflow models serve`
+4. Test the endpoint with curl
+## Notes
+- Always log both parameters and metrics for every run
+- Use `mlflow.autolog()` for automatic logging with sklearn, pytorch, etc.
+- Set meaningful run names for easier comparison in the UI
+- Never skip the baseline comparison step when training new models
+- For team setups, use a shared PostgreSQL backend instead of SQLite

package/templates/bundle-data-pipeline/skills/mlops-pipeline/references/mlflow-commands.md ADDED Viewed

@@ -0,0 +1,69 @@
+# MLflow Quick Reference
+## CLI Commands
+```bash
+# Start tracking server
+mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri sqlite:///mlflow.db
+# List experiments
+mlflow experiments list
+# List runs in an experiment
+mlflow runs list --experiment-id 1
+# Serve a model
+mlflow models serve -m "models:/model-name/Production" --port 5001 --no-conda
+# Download artifacts
+mlflow artifacts download -r <run-id> -d ./downloaded-artifacts
+```
+## Python API - Tracking
+```python
+import mlflow
+mlflow.set_tracking_uri("http://localhost:5000")
+mlflow.set_experiment("experiment-name")
+# Auto-log everything (sklearn, pytorch, etc.)
+mlflow.autolog()
+# Manual logging
+with mlflow.start_run(run_name="descriptive-name"):
+    mlflow.log_param("learning_rate", 0.01)
+    mlflow.log_params({"epochs": 100, "batch_size": 32})
+    mlflow.log_metric("loss", 0.5)
+    mlflow.log_metrics({"accuracy": 0.95, "f1": 0.92})
+    mlflow.log_artifact("path/to/file.csv")
+    mlflow.sklearn.log_model(model, "model")
+```
+## Python API - Model Registry
+```python
+client = mlflow.MlflowClient()
+# Register model
+mlflow.register_model("runs:/<run-id>/model", "model-name")
+# List versions
+versions = client.search_model_versions("name='model-name'")
+# Get latest production version
+prod = client.get_latest_versions("model-name", stages=["Production"])
+# Transition stage
+client.transition_model_version_stage("model-name", version=1, stage="Production")
+# Load production model
+model = mlflow.sklearn.load_model("models:/model-name/Production")
+```
+## Python API - Search Runs
+```python
+runs = mlflow.search_runs(
+    experiment_ids=["1"],
+    filter_string="metrics.f1 > 0.9",
+    order_by=["metrics.f1 DESC"],
+    max_results=10
+)
+```

package/templates/bundle-data-pipeline/skills/model-training/SKILL.md CHANGED Viewed

@@ -1,68 +1,187 @@
 ---
 name: model-training
-description: Treinar modelos de ML com Scikit-learn incluindo pipeline de preprocessing, cross-validation e hyperparameter tuning. Use quando for treinar modelos, fazer cross-validation, ou otimizar hiperparâmetros.
+description: Train ML models with scikit-learn including preprocessing pipelines, cross-validation, hyperparameter tuning, and evaluation. Use when you need to train a classifier or regressor, run cross-validation, tune hyperparameters, or compare models against baselines.
+version: 1.0.0
+author: Maestro
 ---
 # Model Training
-## Pipeline completo
+Train, evaluate, and export ML models using scikit-learn pipelines with proper cross-validation and hyperparameter tuning.
+## When to Use
+- User wants to train a classification or regression model
+- User needs cross-validation scores for model selection
+- User wants to tune hyperparameters with GridSearch or RandomizedSearch
+- User needs to compare a model against a baseline
+- User wants to save a trained model for deployment
+## Available Operations
+1. Build a full sklearn Pipeline (preprocessing + model)
+2. Run cross-validation with multiple scoring metrics
+3. Tune hyperparameters with GridSearchCV or RandomizedSearchCV
+4. Evaluate on held-out test set with classification_report / regression metrics
+5. Compare against baseline (DummyClassifier/DummyRegressor)
+6. Save the best model with joblib
+## Multi-Step Workflow
+### Step 1: Install Dependencies
+```bash
+pip install scikit-learn pandas numpy joblib
+```
+### Step 2: Load Prepared Data
+```python
+import pandas as pd
+from sklearn.model_selection import train_test_split
+df = pd.read_parquet("data/processed/dataset_clean.parquet")
+X = df.drop(columns=["target"])
+y = df["target"]
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y, test_size=0.2, random_state=42, stratify=y  # stratify for classification
+)
+print(f"Train: {X_train.shape}, Test: {X_test.shape}")
+print(f"Class distribution:\n{y_train.value_counts(normalize=True)}")
+```
+### Step 3: Build Preprocessing + Model Pipeline
 ```python
 from sklearn.pipeline import Pipeline
 from sklearn.preprocessing import StandardScaler, OneHotEncoder
 from sklearn.compose import ColumnTransformer
-from sklearn.model_selection import cross_val_score, GridSearchCV
 from sklearn.ensemble import RandomForestClassifier
-from sklearn.metrics import classification_report
-import joblib
-# 1. Preprocessamento
-numeric_features = ['age', 'salary', 'experience']
-categorical_features = ['department', 'role']
+numeric_features = ["age", "salary", "experience"]
+categorical_features = ["department", "role"]
 preprocessor = ColumnTransformer(
     transformers=[
-        ('num', StandardScaler(), numeric_features),
-        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
+        ("num", StandardScaler(), numeric_features),
+        ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features),
     ]
 )
-# 2. Pipeline
 pipeline = Pipeline([
-    ('preprocessor', preprocessor),
-    ('classifier', RandomForestClassifier(random_state=42))
+    ("preprocessor", preprocessor),
+    ("classifier", RandomForestClassifier(random_state=42)),
 ])
+```
-# 3. Cross-validation
-scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='f1_weighted')
-print(f"F1 Score: {scores.mean():.3f} (+/- {scores.std():.3f})")
+### Step 4: Run Cross-Validation
+```python
+from sklearn.model_selection import cross_val_score
+scores = cross_val_score(pipeline, X_train, y_train, cv=5, scoring="f1_weighted")
+print(f"F1 Score (5-fold CV): {scores.mean():.3f} (+/- {scores.std():.3f})")
+```
+### Step 5: Compare Against Baseline
+```python
+from sklearn.dummy import DummyClassifier
+baseline = DummyClassifier(strategy="most_frequent")
+baseline.fit(X_train.select_dtypes(include="number"), y_train)
+baseline_score = baseline.score(X_test.select_dtypes(include="number"), y_test)
+print(f"Baseline accuracy: {baseline_score:.3f}")
+```
+### Step 6: Hyperparameter Tuning
+```python
+from sklearn.model_selection import GridSearchCV
-# 4. Hyperparameter tuning
 param_grid = {
-    'classifier__n_estimators': [100, 200, 500],
-    'classifier__max_depth': [10, 20, None],
-    'classifier__min_samples_split': [2, 5, 10]
+    "classifier__n_estimators": [100, 200, 500],
+    "classifier__max_depth": [10, 20, None],
+    "classifier__min_samples_split": [2, 5, 10],
 }
-grid_search = GridSearchCV(pipeline, param_grid, cv=5, scoring='f1_weighted', n_jobs=-1)
+grid_search = GridSearchCV(
+    pipeline, param_grid, cv=5, scoring="f1_weighted", n_jobs=-1, verbose=1
+)
 grid_search.fit(X_train, y_train)
-# 5. Avaliação final
-y_pred = grid_search.predict(X_test)
-print(classification_report(y_test, y_pred))
-# 6. Salvar modelo
-joblib.dump(grid_search.best_estimator_, 'models/model_v1.pkl')
+print(f"Best params: {grid_search.best_params_}")
+print(f"Best CV score: {grid_search.best_score_:.3f}")
 ```
-## Sempre comparar com baseline
+For larger search spaces, use RandomizedSearchCV:
+```python
+from sklearn.model_selection import RandomizedSearchCV
+from scipy.stats import randint, uniform
+param_distributions = {
+    "classifier__n_estimators": randint(50, 500),
+    "classifier__max_depth": [5, 10, 20, None],
+    "classifier__min_samples_split": randint(2, 20),
+}
+random_search = RandomizedSearchCV(
+    pipeline, param_distributions, n_iter=50, cv=5,
+    scoring="f1_weighted", n_jobs=-1, random_state=42
+)
+random_search.fit(X_train, y_train)
+```
+### Step 7: Final Evaluation on Test Set
 ```python
-from sklearn.dummy import DummyClassifier
+from sklearn.metrics import classification_report, confusion_matrix
-baseline = DummyClassifier(strategy='most_frequent')
-baseline.fit(X_train, y_train)
-baseline_score = baseline.score(X_test, y_test)
-print(f"Baseline accuracy: {baseline_score:.3f}")
+y_pred = grid_search.predict(X_test)
+print(classification_report(y_test, y_pred))
+print(f"\nConfusion Matrix:\n{confusion_matrix(y_test, y_pred)}")
 print(f"Model accuracy: {grid_search.score(X_test, y_test):.3f}")
 ```
+### Step 8: Save the Best Model
+```bash
+mkdir -p models
+```
+```python
+import joblib
+best_model = grid_search.best_estimator_
+joblib.dump(best_model, "models/model_v1.pkl")
+print("Saved best model to models/model_v1.pkl")
+# Verify the saved model works
+loaded_model = joblib.load("models/model_v1.pkl")
+assert (loaded_model.predict(X_test) == y_pred).all()
+print("Model verification passed!")
+```
+## Resources
+- `references/model-selection-guide.md` - Which model to use for which problem
+- `references/evaluation-metrics.md` - Metrics reference for classification and regression
+## Examples
+### Example 1: Train a Classifier
+User asks: "Train a model to predict customer churn"
+Response approach:
+1. Load and split data with stratification
+2. Build ColumnTransformer for numeric + categorical features
+3. Create Pipeline with RandomForestClassifier
+4. Run 5-fold cross-validation to get baseline performance
+5. Compare against DummyClassifier
+6. Tune with GridSearchCV
+7. Evaluate on test set with classification_report
+8. Save best model with joblib
+### Example 2: Quick Model Comparison
+User asks: "Which algorithm works best for this dataset?"
+Response approach:
+1. Build pipelines for multiple models (RF, LogisticRegression, GradientBoosting)
+2. Run cross_val_score on each
+3. Print comparison table of mean and std scores
+4. Pick the best and run hyperparameter tuning
+5. Report final test set performance
+## Notes
+- Always compare against a baseline before claiming good performance
+- Use stratify=y in train_test_split for imbalanced classification
+- GridSearchCV for small param spaces (<100 combos), RandomizedSearchCV for larger ones
+- Never look at test set metrics until final evaluation
+- Save both the model and the preprocessing pipeline together (Pipeline does this automatically)

package/templates/bundle-data-pipeline/skills/model-training/references/evaluation-metrics.md ADDED Viewed

@@ -0,0 +1,52 @@
+# Evaluation Metrics Reference
+## Classification Metrics
+```python
+from sklearn.metrics import (
+    accuracy_score, precision_score, recall_score, f1_score,
+    classification_report, confusion_matrix, roc_auc_score
+)
+# All-in-one report
+print(classification_report(y_test, y_pred))
+# Individual metrics
+accuracy_score(y_test, y_pred)
+precision_score(y_test, y_pred, average='weighted')
+recall_score(y_test, y_pred, average='weighted')
+f1_score(y_test, y_pred, average='weighted')
+roc_auc_score(y_test, y_pred_proba, multi_class='ovr')
+```
+### When to Use Which
+| Metric | Use When |
+|---|---|
+| Accuracy | Balanced classes |
+| Precision | False positives are costly (spam detection) |
+| Recall | False negatives are costly (disease detection) |
+| F1 | Imbalanced classes, need balance of precision/recall |
+| ROC AUC | Need threshold-independent evaluation |
+## Regression Metrics
+```python
+from sklearn.metrics import (
+    mean_squared_error, mean_absolute_error, r2_score,
+    mean_absolute_percentage_error
+)
+mean_squared_error(y_test, y_pred)            # MSE
+mean_squared_error(y_test, y_pred, squared=False)  # RMSE
+mean_absolute_error(y_test, y_pred)           # MAE
+r2_score(y_test, y_pred)                      # R-squared
+mean_absolute_percentage_error(y_test, y_pred) # MAPE
+```
+### When to Use Which
+| Metric | Use When |
+|---|---|
+| RMSE | Penalize large errors more |
+| MAE | Robust to outliers |
+| R-squared | Compare to baseline (0 = same as mean) |
+| MAPE | Need percentage-based interpretation |

package/templates/bundle-data-pipeline/skills/model-training/references/model-selection-guide.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Model Selection Guide
+## Classification
+| Model | Best For | Pros | Cons |
+|---|---|---|---|
+| LogisticRegression | Binary/multi-class, linear boundaries | Fast, interpretable | Limited to linear |
+| RandomForestClassifier | General purpose, mixed features | Robust, feature importance | Slower, less interpretable |
+| GradientBoostingClassifier | High accuracy needed | Best accuracy usually | Slow to train, overfitting risk |
+| XGBClassifier | Competitions, large data | Fast, regularization | Needs tuning |
+| SVC | Small-medium data, non-linear | Flexible kernels | Slow on large data |
+## Regression
+| Model | Best For | Pros | Cons |
+|---|---|---|---|
+| LinearRegression | Linear relationships | Fast, interpretable | Limited to linear |
+| Ridge/Lasso | Regularized linear | Handles multicollinearity | Still linear |
+| RandomForestRegressor | Non-linear, mixed features | Robust | Slower |
+| GradientBoostingRegressor | High accuracy | Best accuracy usually | Needs tuning |
+| XGBRegressor | Large datasets | Fast, scalable | Complex tuning |
+## Quick Start Recipes
+### Binary Classification
+```python
+from sklearn.ensemble import GradientBoostingClassifier
+model = GradientBoostingClassifier(n_estimators=200, max_depth=5, random_state=42)
+```
+### Multi-class Classification
+```python
+from sklearn.ensemble import RandomForestClassifier
+model = RandomForestClassifier(n_estimators=200, random_state=42)
+```
+### Regression
+```python
+from sklearn.ensemble import GradientBoostingRegressor
+model = GradientBoostingRegressor(n_estimators=200, max_depth=5, random_state=42)
+```