npm - omgkit - Versions diffs - 2.19.3 → 2.21.0 - Mend

omgkit 2.19.3 → 2.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (73) hide show

package/README.md +537 -338
package/package.json +2 -2
package/plugin/agents/ai-architect-agent.md +282 -0
package/plugin/agents/data-scientist-agent.md +221 -0
package/plugin/agents/experiment-analyst-agent.md +318 -0
package/plugin/agents/ml-engineer-agent.md +165 -0
package/plugin/agents/mlops-engineer-agent.md +324 -0
package/plugin/agents/model-optimizer-agent.md +287 -0
package/plugin/agents/production-engineer-agent.md +360 -0
package/plugin/agents/research-scientist-agent.md +274 -0
package/plugin/commands/omgdata/augment.md +86 -0
package/plugin/commands/omgdata/collect.md +81 -0
package/plugin/commands/omgdata/label.md +83 -0
package/plugin/commands/omgdata/split.md +83 -0
package/plugin/commands/omgdata/validate.md +76 -0
package/plugin/commands/omgdata/version.md +85 -0
package/plugin/commands/omgdeploy/ab.md +94 -0
package/plugin/commands/omgdeploy/cloud.md +89 -0
package/plugin/commands/omgdeploy/edge.md +93 -0
package/plugin/commands/omgdeploy/package.md +91 -0
package/plugin/commands/omgdeploy/serve.md +92 -0
package/plugin/commands/omgfeature/embed.md +93 -0
package/plugin/commands/omgfeature/extract.md +93 -0
package/plugin/commands/omgfeature/select.md +85 -0
package/plugin/commands/omgfeature/store.md +97 -0
package/plugin/commands/omgml/init.md +60 -0
package/plugin/commands/omgml/status.md +82 -0
package/plugin/commands/omgops/drift.md +87 -0
package/plugin/commands/omgops/monitor.md +99 -0
package/plugin/commands/omgops/pipeline.md +102 -0
package/plugin/commands/omgops/registry.md +109 -0
package/plugin/commands/omgops/retrain.md +91 -0
package/plugin/commands/omgoptim/distill.md +90 -0
package/plugin/commands/omgoptim/profile.md +92 -0
package/plugin/commands/omgoptim/prune.md +81 -0
package/plugin/commands/omgoptim/quantize.md +83 -0
package/plugin/commands/omgtrain/baseline.md +78 -0
package/plugin/commands/omgtrain/compare.md +99 -0
package/plugin/commands/omgtrain/evaluate.md +85 -0
package/plugin/commands/omgtrain/train.md +81 -0
package/plugin/commands/omgtrain/tune.md +89 -0
package/plugin/registry.yaml +252 -2
package/plugin/skills/ml-systems/SKILL.md +65 -0
package/plugin/skills/ml-systems/ai-accelerators/SKILL.md +342 -0
package/plugin/skills/ml-systems/data-eng/SKILL.md +126 -0
package/plugin/skills/ml-systems/deep-learning-primer/SKILL.md +143 -0
package/plugin/skills/ml-systems/deployment-paradigms/SKILL.md +148 -0
package/plugin/skills/ml-systems/dnn-architectures/SKILL.md +128 -0
package/plugin/skills/ml-systems/edge-deployment/SKILL.md +366 -0
package/plugin/skills/ml-systems/efficient-ai/SKILL.md +316 -0
package/plugin/skills/ml-systems/feature-engineering/SKILL.md +151 -0
package/plugin/skills/ml-systems/ml-frameworks/SKILL.md +187 -0
package/plugin/skills/ml-systems/ml-serving-optimization/SKILL.md +371 -0
package/plugin/skills/ml-systems/ml-systems-fundamentals/SKILL.md +103 -0
package/plugin/skills/ml-systems/ml-workflow/SKILL.md +162 -0
package/plugin/skills/ml-systems/mlops/SKILL.md +386 -0
package/plugin/skills/ml-systems/model-deployment/SKILL.md +350 -0
package/plugin/skills/ml-systems/model-dev/SKILL.md +160 -0
package/plugin/skills/ml-systems/model-optimization/SKILL.md +339 -0
package/plugin/skills/ml-systems/robust-ai/SKILL.md +395 -0
package/plugin/skills/ml-systems/training-data/SKILL.md +152 -0
package/plugin/workflows/ml-systems/data-preparation-workflow.md +276 -0
package/plugin/workflows/ml-systems/edge-deployment-workflow.md +413 -0
package/plugin/workflows/ml-systems/full-ml-lifecycle-workflow.md +405 -0
package/plugin/workflows/ml-systems/hyperparameter-tuning-workflow.md +352 -0
package/plugin/workflows/ml-systems/mlops-pipeline-workflow.md +384 -0
package/plugin/workflows/ml-systems/model-deployment-workflow.md +392 -0
package/plugin/workflows/ml-systems/model-development-workflow.md +218 -0
package/plugin/workflows/ml-systems/model-evaluation-workflow.md +416 -0
package/plugin/workflows/ml-systems/model-optimization-workflow.md +390 -0
package/plugin/workflows/ml-systems/monitoring-drift-workflow.md +446 -0
package/plugin/workflows/ml-systems/retraining-workflow.md +401 -0
package/plugin/workflows/ml-systems/training-pipeline-workflow.md +382 -0

package/plugin/skills/ml-systems/model-deployment/SKILL.md ADDED Viewed

@@ -0,0 +1,350 @@
+---
+name: model-deployment
+description: Model deployment strategies including serving infrastructure, containerization, model packaging, versioning, and production deployment patterns.
+---
+# Model Deployment
+Deploying ML models to production.
+## Deployment Architecture
+```
+┌─────────────────────────────────────────────────────────────┐
+│                   ML DEPLOYMENT PATTERNS                     │
+├─────────────────────────────────────────────────────────────┤
+│                                                              │
+│  BATCH INFERENCE       REAL-TIME          STREAMING         │
+│  ───────────────       ─────────          ─────────         │
+│  Spark/Airflow         REST/gRPC          Kafka/Flink       │
+│  High throughput       Low latency        Continuous        │
+│  Scheduled runs        On-demand          Event-driven      │
+│                                                              │
+│  EMBEDDED              EDGE               SERVERLESS        │
+│  ────────              ────               ──────────        │
+│  Mobile SDK            IoT devices        AWS Lambda        │
+│  On-device             Local inference    Auto-scaling      │
+│  Offline capable       Bandwidth limited  Pay per request   │
+│                                                              │
+└─────────────────────────────────────────────────────────────┘
+```
+## Model Serving Frameworks
+### TorchServe
+```python
+# Handler for TorchServe
+from ts.torch_handler.base_handler import BaseHandler
+import torch
+class ModelHandler(BaseHandler):
+    def initialize(self, context):
+        self.manifest = context.manifest
+        model_dir = context.system_properties.get("model_dir")
+        self.model = torch.jit.load(f"{model_dir}/model.pt")
+        self.model.eval()
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        self.model.to(self.device)
+    def preprocess(self, data):
+        inputs = []
+        for row in data:
+            input_data = row.get("data") or row.get("body")
+            inputs.append(torch.tensor(input_data))
+        return torch.stack(inputs).to(self.device)
+    def inference(self, data):
+        with torch.no_grad():
+            return self.model(data)
+    def postprocess(self, inference_output):
+        return inference_output.tolist()
+# Package model
+# torch-model-archiver --model-name model --version 1.0 \
+#   --serialized-file model.pt --handler handler.py
+```
+### TensorFlow Serving
+```python
+import tensorflow as tf
+# Save model in SavedModel format
+tf.saved_model.save(model, "saved_model/1")
+# Serve with Docker
+# docker run -p 8501:8501 \
+#   -v /path/to/saved_model:/models/model \
+#   -e MODEL_NAME=model \
+#   tensorflow/serving
+# Client request
+import requests
+import json
+data = {"instances": [[1.0, 2.0, 3.0]]}
+response = requests.post(
+    "http://localhost:8501/v1/models/model:predict",
+    json=data
+)
+predictions = response.json()["predictions"]
+```
+### Triton Inference Server
+```python
+# Model repository structure
+# models/
+#   model_name/
+#     config.pbtxt
+#     1/
+#       model.onnx
+# config.pbtxt
+"""
+name: "my_model"
+platform: "onnxruntime_onnx"
+max_batch_size: 64
+input [
+  {
+    name: "input"
+    data_type: TYPE_FP32
+    dims: [ -1, 784 ]
+  }
+]
+output [
+  {
+    name: "output"
+    data_type: TYPE_FP32
+    dims: [ -1, 10 ]
+  }
+]
+instance_group [
+  { count: 2, kind: KIND_GPU }
+]
+dynamic_batching {
+  preferred_batch_size: [ 16, 32 ]
+  max_queue_delay_microseconds: 100
+}
+"""
+# Python client
+import tritonclient.grpc as grpcclient
+client = grpcclient.InferenceServerClient("localhost:8001")
+inputs = [grpcclient.InferInput("input", [1, 784], "FP32")]
+inputs[0].set_data_from_numpy(input_data)
+outputs = [grpcclient.InferRequestedOutput("output")]
+result = client.infer("my_model", inputs, outputs=outputs)
+```
+## Containerization
+### Docker for ML
+```dockerfile
+# Multi-stage build for production
+FROM python:3.10-slim as builder
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --user --no-cache-dir -r requirements.txt
+FROM python:3.10-slim
+# Non-root user for security
+RUN useradd -m -u 1000 appuser
+USER appuser
+WORKDIR /app
+COPY --from=builder /root/.local /home/appuser/.local
+COPY --chown=appuser:appuser . .
+ENV PATH=/home/appuser/.local/bin:$PATH
+ENV MODEL_PATH=/app/models/model.pt
+HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
+  CMD curl -f http://localhost:8000/health || exit 1
+EXPOSE 8000
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
+```
+### Kubernetes Deployment
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: ml-model
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: ml-model
+  template:
+    metadata:
+      labels:
+        app: ml-model
+    spec:
+      containers:
+      - name: model
+        image: ml-model:v1.0
+        resources:
+          requests:
+            memory: "2Gi"
+            cpu: "1"
+            nvidia.com/gpu: 1
+          limits:
+            memory: "4Gi"
+            cpu: "2"
+            nvidia.com/gpu: 1
+        ports:
+        - containerPort: 8000
+        livenessProbe:
+          httpGet:
+            path: /health
+            port: 8000
+          initialDelaySeconds: 30
+          periodSeconds: 10
+        readinessProbe:
+          httpGet:
+            path: /ready
+            port: 8000
+          initialDelaySeconds: 5
+          periodSeconds: 5
+        env:
+        - name: MODEL_VERSION
+          value: "1.0"
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: ml-model-service
+spec:
+  selector:
+    app: ml-model
+  ports:
+  - port: 80
+    targetPort: 8000
+  type: LoadBalancer
+---
+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: ml-model-hpa
+spec:
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: ml-model
+  minReplicas: 2
+  maxReplicas: 10
+  metrics:
+  - type: Resource
+    resource:
+      name: cpu
+      target:
+        type: Utilization
+        averageUtilization: 70
+```
+## FastAPI Model Server
+```python
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+import torch
+import numpy as np
+app = FastAPI(title="ML Model API", version="1.0")
+class PredictionRequest(BaseModel):
+    features: list[float]
+class PredictionResponse(BaseModel):
+    prediction: int
+    confidence: float
+    model_version: str
+# Load model on startup
+@app.on_event("startup")
+async def load_model():
+    global model
+    model = torch.jit.load("model.pt")
+    model.eval()
+@app.get("/health")
+async def health():
+    return {"status": "healthy"}
+@app.get("/ready")
+async def ready():
+    if model is None:
+        raise HTTPException(status_code=503, detail="Model not loaded")
+    return {"status": "ready"}
+@app.post("/predict", response_model=PredictionResponse)
+async def predict(request: PredictionRequest):
+    try:
+        input_tensor = torch.tensor([request.features])
+        with torch.no_grad():
+            output = model(input_tensor)
+            probs = torch.softmax(output, dim=1)
+            prediction = output.argmax(dim=1).item()
+            confidence = probs[0][prediction].item()
+        return PredictionResponse(
+            prediction=prediction,
+            confidence=confidence,
+            model_version="1.0"
+        )
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+@app.post("/batch_predict")
+async def batch_predict(requests: list[PredictionRequest]):
+    inputs = torch.tensor([r.features for r in requests])
+    with torch.no_grad():
+        outputs = model(inputs)
+    return {"predictions": outputs.argmax(dim=1).tolist()}
+```
+## Model Versioning
+```python
+import mlflow
+# Register model version
+with mlflow.start_run():
+    mlflow.sklearn.log_model(model, "model", registered_model_name="production_model")
+# Transition to production
+client = mlflow.tracking.MlflowClient()
+client.transition_model_version_stage(
+    name="production_model",
+    version=3,
+    stage="Production"
+)
+# Load production model
+model = mlflow.pyfunc.load_model("models:/production_model/Production")
+# Canary deployment
+def route_request(request, canary_percentage=10):
+    import random
+    if random.random() < canary_percentage / 100:
+        return canary_model.predict(request)
+    return production_model.predict(request)
+```
+## Commands
+- `/omgdeploy:package` - Package model
+- `/omgdeploy:serve` - Serve model
+- `/omgdeploy:cloud` - Cloud deployment
+- `/omgops:registry` - Model registry
+## Best Practices
+1. Use health and readiness probes
+2. Implement graceful shutdown
+3. Version models explicitly
+4. Monitor inference latency
+5. Use canary deployments for safety

package/plugin/skills/ml-systems/model-dev/SKILL.md ADDED Viewed

@@ -0,0 +1,160 @@
+---
+name: model-development
+description: Model development practices including model selection, training pipelines, hyperparameter tuning, evaluation, and model selection strategies.
+---
+# Model Development
+Building and training ML models effectively.
+## Model Selection
+```python
+from sklearn.model_selection import cross_val_score
+models = {
+    "logistic": LogisticRegression(),
+    "random_forest": RandomForestClassifier(),
+    "xgboost": XGBClassifier(),
+    "lightgbm": LGBMClassifier(),
+    "catboost": CatBoostClassifier(verbose=False)
+}
+results = {}
+for name, model in models.items():
+    scores = cross_val_score(model, X, y, cv=5, scoring="f1_macro")
+    results[name] = {
+        "mean": scores.mean(),
+        "std": scores.std()
+    }
+    print(f"{name}: {scores.mean():.3f} (+/- {scores.std():.3f})")
+```
+## Training Pipeline
+```python
+import torch
+import torch.nn as nn
+from torch.utils.data import DataLoader
+class TrainingPipeline:
+    def __init__(self, model, optimizer, criterion, device):
+        self.model = model.to(device)
+        self.optimizer = optimizer
+        self.criterion = criterion
+        self.device = device
+    def train_epoch(self, dataloader):
+        self.model.train()
+        total_loss = 0
+        for batch in dataloader:
+            x, y = batch[0].to(self.device), batch[1].to(self.device)
+            self.optimizer.zero_grad()
+            output = self.model(x)
+            loss = self.criterion(output, y)
+            loss.backward()
+            torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
+            self.optimizer.step()
+            total_loss += loss.item()
+        return total_loss / len(dataloader)
+    def evaluate(self, dataloader):
+        self.model.eval()
+        predictions, targets = [], []
+        with torch.no_grad():
+            for batch in dataloader:
+                x, y = batch[0].to(self.device), batch[1].to(self.device)
+                output = self.model(x)
+                predictions.extend(output.argmax(dim=1).cpu().numpy())
+                targets.extend(y.cpu().numpy())
+        return accuracy_score(targets, predictions)
+```
+## Hyperparameter Tuning
+```python
+import optuna
+def objective(trial):
+    params = {
+        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True),
+        "max_depth": trial.suggest_int("max_depth", 3, 10),
+        "n_estimators": trial.suggest_int("n_estimators", 50, 500),
+        "min_child_weight": trial.suggest_int("min_child_weight", 1, 10)
+    }
+    model = XGBClassifier(**params, use_label_encoder=False, eval_metric="logloss")
+    scores = cross_val_score(model, X_train, y_train, cv=5, scoring="f1_macro")
+    return scores.mean()
+study = optuna.create_study(direction="maximize")
+study.optimize(objective, n_trials=100)
+print(f"Best params: {study.best_params}")
+print(f"Best F1: {study.best_value:.3f}")
+```
+## Model Evaluation
+```python
+from sklearn.metrics import classification_report, confusion_matrix
+def comprehensive_evaluation(model, X_test, y_test):
+    y_pred = model.predict(X_test)
+    y_prob = model.predict_proba(X_test)[:, 1]
+    # Classification metrics
+    print(classification_report(y_test, y_pred))
+    # Confusion matrix
+    cm = confusion_matrix(y_test, y_pred)
+    print(f"Confusion Matrix:\n{cm}")
+    # ROC-AUC
+    roc_auc = roc_auc_score(y_test, y_prob)
+    print(f"ROC-AUC: {roc_auc:.3f}")
+    # Precision-Recall AUC
+    pr_auc = average_precision_score(y_test, y_prob)
+    print(f"PR-AUC: {pr_auc:.3f}")
+    return {
+        "classification_report": classification_report(y_test, y_pred, output_dict=True),
+        "confusion_matrix": cm,
+        "roc_auc": roc_auc,
+        "pr_auc": pr_auc
+    }
+```
+## Model Registry
+```python
+import mlflow.sklearn
+# Register model
+with mlflow.start_run():
+    mlflow.sklearn.log_model(
+        model,
+        "model",
+        registered_model_name="churn_predictor"
+    )
+# Load registered model
+model = mlflow.pyfunc.load_model(
+    model_uri="models:/churn_predictor/Production"
+)
+```
+## Commands
+- `/omgtrain:train` - Train model
+- `/omgtrain:tune` - Hyperparameter tuning
+- `/omgtrain:evaluate` - Evaluate model
+## Best Practices
+1. Use cross-validation
+2. Tune hyperparameters systematically
+3. Evaluate on multiple metrics
+4. Check for overfitting
+5. Register successful models