omgkit 2.20.0 → 2.21.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. package/README.md +125 -10
  2. package/package.json +1 -1
  3. package/plugin/agents/ai-architect-agent.md +282 -0
  4. package/plugin/agents/data-scientist-agent.md +221 -0
  5. package/plugin/agents/experiment-analyst-agent.md +318 -0
  6. package/plugin/agents/ml-engineer-agent.md +165 -0
  7. package/plugin/agents/mlops-engineer-agent.md +324 -0
  8. package/plugin/agents/model-optimizer-agent.md +287 -0
  9. package/plugin/agents/production-engineer-agent.md +360 -0
  10. package/plugin/agents/research-scientist-agent.md +274 -0
  11. package/plugin/commands/omgdata/augment.md +86 -0
  12. package/plugin/commands/omgdata/collect.md +81 -0
  13. package/plugin/commands/omgdata/label.md +83 -0
  14. package/plugin/commands/omgdata/split.md +83 -0
  15. package/plugin/commands/omgdata/validate.md +76 -0
  16. package/plugin/commands/omgdata/version.md +85 -0
  17. package/plugin/commands/omgdeploy/ab.md +94 -0
  18. package/plugin/commands/omgdeploy/cloud.md +89 -0
  19. package/plugin/commands/omgdeploy/edge.md +93 -0
  20. package/plugin/commands/omgdeploy/package.md +91 -0
  21. package/plugin/commands/omgdeploy/serve.md +92 -0
  22. package/plugin/commands/omgfeature/embed.md +93 -0
  23. package/plugin/commands/omgfeature/extract.md +93 -0
  24. package/plugin/commands/omgfeature/select.md +85 -0
  25. package/plugin/commands/omgfeature/store.md +97 -0
  26. package/plugin/commands/omgml/init.md +60 -0
  27. package/plugin/commands/omgml/status.md +82 -0
  28. package/plugin/commands/omgops/drift.md +87 -0
  29. package/plugin/commands/omgops/monitor.md +99 -0
  30. package/plugin/commands/omgops/pipeline.md +102 -0
  31. package/plugin/commands/omgops/registry.md +109 -0
  32. package/plugin/commands/omgops/retrain.md +91 -0
  33. package/plugin/commands/omgoptim/distill.md +90 -0
  34. package/plugin/commands/omgoptim/profile.md +92 -0
  35. package/plugin/commands/omgoptim/prune.md +81 -0
  36. package/plugin/commands/omgoptim/quantize.md +83 -0
  37. package/plugin/commands/omgtrain/baseline.md +78 -0
  38. package/plugin/commands/omgtrain/compare.md +99 -0
  39. package/plugin/commands/omgtrain/evaluate.md +85 -0
  40. package/plugin/commands/omgtrain/train.md +81 -0
  41. package/plugin/commands/omgtrain/tune.md +89 -0
  42. package/plugin/registry.yaml +252 -2
  43. package/plugin/skills/ml-systems/SKILL.md +65 -0
  44. package/plugin/skills/ml-systems/ai-accelerators/SKILL.md +342 -0
  45. package/plugin/skills/ml-systems/data-eng/SKILL.md +126 -0
  46. package/plugin/skills/ml-systems/deep-learning-primer/SKILL.md +143 -0
  47. package/plugin/skills/ml-systems/deployment-paradigms/SKILL.md +148 -0
  48. package/plugin/skills/ml-systems/dnn-architectures/SKILL.md +128 -0
  49. package/plugin/skills/ml-systems/edge-deployment/SKILL.md +366 -0
  50. package/plugin/skills/ml-systems/efficient-ai/SKILL.md +316 -0
  51. package/plugin/skills/ml-systems/feature-engineering/SKILL.md +151 -0
  52. package/plugin/skills/ml-systems/ml-frameworks/SKILL.md +187 -0
  53. package/plugin/skills/ml-systems/ml-serving-optimization/SKILL.md +371 -0
  54. package/plugin/skills/ml-systems/ml-systems-fundamentals/SKILL.md +103 -0
  55. package/plugin/skills/ml-systems/ml-workflow/SKILL.md +162 -0
  56. package/plugin/skills/ml-systems/mlops/SKILL.md +386 -0
  57. package/plugin/skills/ml-systems/model-deployment/SKILL.md +350 -0
  58. package/plugin/skills/ml-systems/model-dev/SKILL.md +160 -0
  59. package/plugin/skills/ml-systems/model-optimization/SKILL.md +339 -0
  60. package/plugin/skills/ml-systems/robust-ai/SKILL.md +395 -0
  61. package/plugin/skills/ml-systems/training-data/SKILL.md +152 -0
  62. package/plugin/workflows/ml-systems/data-preparation-workflow.md +276 -0
  63. package/plugin/workflows/ml-systems/edge-deployment-workflow.md +413 -0
  64. package/plugin/workflows/ml-systems/full-ml-lifecycle-workflow.md +405 -0
  65. package/plugin/workflows/ml-systems/hyperparameter-tuning-workflow.md +352 -0
  66. package/plugin/workflows/ml-systems/mlops-pipeline-workflow.md +384 -0
  67. package/plugin/workflows/ml-systems/model-deployment-workflow.md +392 -0
  68. package/plugin/workflows/ml-systems/model-development-workflow.md +218 -0
  69. package/plugin/workflows/ml-systems/model-evaluation-workflow.md +416 -0
  70. package/plugin/workflows/ml-systems/model-optimization-workflow.md +390 -0
  71. package/plugin/workflows/ml-systems/monitoring-drift-workflow.md +446 -0
  72. package/plugin/workflows/ml-systems/retraining-workflow.md +401 -0
  73. package/plugin/workflows/ml-systems/training-pipeline-workflow.md +382 -0
@@ -0,0 +1,382 @@
1
+ ---
2
+ name: Training Pipeline Workflow
3
+ description: Automated training pipeline workflow for reproducible model training with experiment tracking and model registration.
4
+ category: ml-systems
5
+ complexity: medium
6
+ agents:
7
+ - ml-engineer-agent
8
+ - mlops-engineer-agent
9
+ ---
10
+
11
+ # Training Pipeline Workflow
12
+
13
+ Automated pipeline for reproducible model training.
14
+
15
+ ## Overview
16
+
17
+ ```
18
+ ┌─────────────────────────────────────────────────────────────┐
19
+ │ TRAINING PIPELINE WORKFLOW │
20
+ ├─────────────────────────────────────────────────────────────┤
21
+ │ │
22
+ │ TRIGGER DATA PREP TRAINING │
23
+ │ ──────── ───────── ──────── │
24
+ │ Schedule Load features Train model │
25
+ │ Manual Validate Log metrics │
26
+ │ Drift detect Split Save checkpoint │
27
+ │ │
28
+ │ EVALUATION REGISTRATION NOTIFICATION │
29
+ │ ────────── ──────────── ──────────── │
30
+ │ Test metrics Model registry Slack/Email │
31
+ │ Comparison Version tag Dashboard update │
32
+ │ Quality gates Artifacts Next steps │
33
+ │ │
34
+ └─────────────────────────────────────────────────────────────┘
35
+ ```
36
+
37
+ ## Pipeline Configuration
38
+
39
+ ```yaml
40
+ # pipeline_config.yaml
41
+ pipeline:
42
+ name: model_training_pipeline
43
+ schedule: "0 2 * * 0" # Weekly at 2 AM Sunday
44
+ timeout: 3600 # 1 hour
45
+ retries: 2
46
+
47
+ data:
48
+ source: feature_store
49
+ features:
50
+ - user_features
51
+ - transaction_features
52
+ target: is_churned
53
+ split:
54
+ train: 0.7
55
+ val: 0.15
56
+ test: 0.15
57
+
58
+ training:
59
+ model_type: xgboost
60
+ hyperparameters:
61
+ max_depth: 6
62
+ learning_rate: 0.1
63
+ n_estimators: 100
64
+ early_stopping:
65
+ patience: 10
66
+ metric: val_auc
67
+
68
+ evaluation:
69
+ metrics:
70
+ - accuracy
71
+ - precision
72
+ - recall
73
+ - f1
74
+ - auc
75
+ thresholds:
76
+ auc: 0.85
77
+ precision: 0.80
78
+
79
+ registration:
80
+ model_name: churn_predictor
81
+ auto_promote: false
82
+ ```
83
+
84
+ ## Steps
85
+
86
+ ### Step 1: Pipeline Trigger
87
+ **Agent**: mlops-engineer-agent
88
+
89
+ **Triggers**:
90
+ - Scheduled (cron)
91
+ - Manual trigger
92
+ - Drift detection alert
93
+ - New data arrival
94
+ - CI/CD push
95
+
96
+ **Actions**:
97
+ ```bash
98
+ # Create/update pipeline
99
+ /omgops:pipeline --config pipeline_config.yaml --action create
100
+
101
+ # Manual trigger
102
+ /omgops:pipeline --name model_training_pipeline --action run
103
+ ```
104
+
105
+ ### Step 2: Data Preparation
106
+ **Agent**: ml-engineer-agent
107
+
108
+ **Inputs**:
109
+ - Feature store reference
110
+ - Data version
111
+ - Split configuration
112
+
113
+ **Actions**:
114
+ ```python
115
+ # Pipeline step: data_preparation
116
+ def prepare_data(config):
117
+ # Load features from feature store
118
+ features = feature_store.get_historical_features(
119
+ entity_df=entity_df,
120
+ features=config['data']['features']
121
+ )
122
+
123
+ # Validate data
124
+ validation_result = validate_data(features, config['data']['schema'])
125
+ if not validation_result.passed:
126
+ raise DataValidationError(validation_result.errors)
127
+
128
+ # Split data
129
+ train, val, test = split_data(
130
+ features,
131
+ ratios=config['data']['split'],
132
+ stratify=config['data']['target']
133
+ )
134
+
135
+ return train, val, test
136
+ ```
137
+
138
+ **Outputs**:
139
+ - Prepared datasets
140
+ - Data validation report
141
+ - Split statistics
142
+
143
+ ### Step 3: Model Training
144
+ **Agent**: ml-engineer-agent
145
+
146
+ **Inputs**:
147
+ - Training data
148
+ - Hyperparameters
149
+ - Training configuration
150
+
151
+ **Actions**:
152
+ ```bash
153
+ # Execute training
154
+ /omgtrain:train --config pipeline_config.yaml --experiment-name weekly_training
155
+ ```
156
+
157
+ ```python
158
+ # Pipeline step: train_model
159
+ def train_model(train_data, val_data, config):
160
+ with mlflow.start_run(run_name=f"train_{datetime.now().isoformat()}"):
161
+ # Log parameters
162
+ mlflow.log_params(config['training']['hyperparameters'])
163
+
164
+ # Initialize model
165
+ model = XGBClassifier(**config['training']['hyperparameters'])
166
+
167
+ # Train with early stopping
168
+ model.fit(
169
+ train_data.X, train_data.y,
170
+ eval_set=[(val_data.X, val_data.y)],
171
+ early_stopping_rounds=config['training']['early_stopping']['patience']
172
+ )
173
+
174
+ # Log training metrics
175
+ for metric, value in model.evals_result_['validation_0'].items():
176
+ for i, v in enumerate(value):
177
+ mlflow.log_metric(f"val_{metric}", v, step=i)
178
+
179
+ # Save model checkpoint
180
+ mlflow.xgboost.log_model(model, "model")
181
+
182
+ return model, mlflow.active_run().info.run_id
183
+ ```
184
+
185
+ **Outputs**:
186
+ - Trained model
187
+ - Training metrics
188
+ - MLflow run ID
189
+
190
+ ### Step 4: Evaluation
191
+ **Agent**: experiment-analyst-agent
192
+
193
+ **Inputs**:
194
+ - Trained model
195
+ - Test dataset
196
+ - Evaluation thresholds
197
+
198
+ **Actions**:
199
+ ```bash
200
+ # Evaluate model
201
+ /omgtrain:evaluate --run-id <run_id> --data test.csv --thresholds thresholds.yaml
202
+ ```
203
+
204
+ ```python
205
+ # Pipeline step: evaluate_model
206
+ def evaluate_model(model, test_data, config):
207
+ predictions = model.predict(test_data.X)
208
+ probabilities = model.predict_proba(test_data.X)[:, 1]
209
+
210
+ metrics = {
211
+ 'accuracy': accuracy_score(test_data.y, predictions),
212
+ 'precision': precision_score(test_data.y, predictions),
213
+ 'recall': recall_score(test_data.y, predictions),
214
+ 'f1': f1_score(test_data.y, predictions),
215
+ 'auc': roc_auc_score(test_data.y, probabilities)
216
+ }
217
+
218
+ # Check quality gates
219
+ quality_passed = all(
220
+ metrics[metric] >= threshold
221
+ for metric, threshold in config['evaluation']['thresholds'].items()
222
+ )
223
+
224
+ return metrics, quality_passed
225
+ ```
226
+
227
+ **Outputs**:
228
+ - Evaluation metrics
229
+ - Quality gate results
230
+ - Error analysis
231
+
232
+ ### Step 5: Model Registration
233
+ **Agent**: mlops-engineer-agent
234
+
235
+ **Inputs**:
236
+ - Trained model
237
+ - Evaluation results
238
+ - Registration configuration
239
+
240
+ **Actions**:
241
+ ```bash
242
+ # Register model
243
+ /omgops:registry --run-id <run_id> --model-name churn_predictor --stage staging
244
+ ```
245
+
246
+ ```python
247
+ # Pipeline step: register_model
248
+ def register_model(run_id, metrics, config):
249
+ if not metrics['quality_passed']:
250
+ logging.warning("Quality gates not passed, skipping registration")
251
+ return None
252
+
253
+ # Register model version
254
+ model_version = mlflow.register_model(
255
+ f"runs:/{run_id}/model",
256
+ config['registration']['model_name']
257
+ )
258
+
259
+ # Add metadata
260
+ client = MlflowClient()
261
+ client.set_model_version_tag(
262
+ name=config['registration']['model_name'],
263
+ version=model_version.version,
264
+ key="metrics",
265
+ value=json.dumps(metrics)
266
+ )
267
+
268
+ # Auto-promote if configured
269
+ if config['registration']['auto_promote']:
270
+ client.transition_model_version_stage(
271
+ name=config['registration']['model_name'],
272
+ version=model_version.version,
273
+ stage="Staging"
274
+ )
275
+
276
+ return model_version
277
+ ```
278
+
279
+ **Outputs**:
280
+ - Registered model version
281
+ - Model artifacts
282
+ - Promotion status
283
+
284
+ ### Step 6: Notification
285
+ **Agent**: mlops-engineer-agent
286
+
287
+ **Inputs**:
288
+ - Pipeline results
289
+ - Metrics
290
+ - Status
291
+
292
+ **Actions**:
293
+ ```python
294
+ # Pipeline step: notify
295
+ def notify_completion(results):
296
+ message = f"""
297
+ 🤖 Training Pipeline Complete
298
+
299
+ Model: {results['model_name']}
300
+ Version: {results['version']}
301
+ Status: {'✅ Passed' if results['quality_passed'] else '❌ Failed'}
302
+
303
+ Metrics:
304
+ - AUC: {results['metrics']['auc']:.4f}
305
+ - F1: {results['metrics']['f1']:.4f}
306
+
307
+ Next: {'Ready for review' if results['quality_passed'] else 'Investigate failures'}
308
+ """
309
+
310
+ # Send to Slack
311
+ send_slack_notification(message, channel="#ml-alerts")
312
+
313
+ # Update dashboard
314
+ update_training_dashboard(results)
315
+ ```
316
+
317
+ **Outputs**:
318
+ - Notifications sent
319
+ - Dashboard updated
320
+ - Logs archived
321
+
322
+ ## Airflow DAG
323
+
324
+ ```python
325
+ from airflow import DAG
326
+ from airflow.operators.python import PythonOperator
327
+
328
+ with DAG(
329
+ 'model_training_pipeline',
330
+ schedule_interval='0 2 * * 0',
331
+ catchup=False
332
+ ) as dag:
333
+
334
+ prepare = PythonOperator(
335
+ task_id='prepare_data',
336
+ python_callable=prepare_data
337
+ )
338
+
339
+ train = PythonOperator(
340
+ task_id='train_model',
341
+ python_callable=train_model
342
+ )
343
+
344
+ evaluate = PythonOperator(
345
+ task_id='evaluate_model',
346
+ python_callable=evaluate_model
347
+ )
348
+
349
+ register = PythonOperator(
350
+ task_id='register_model',
351
+ python_callable=register_model
352
+ )
353
+
354
+ notify = PythonOperator(
355
+ task_id='notify',
356
+ python_callable=notify_completion
357
+ )
358
+
359
+ prepare >> train >> evaluate >> register >> notify
360
+ ```
361
+
362
+ ## Artifacts
363
+
364
+ - `pipeline_config.yaml` - Pipeline configuration
365
+ - `mlflow/` - Experiment tracking
366
+ - `models/` - Model artifacts
367
+ - `logs/` - Pipeline logs
368
+ - `reports/` - Evaluation reports
369
+
370
+ ## Next Workflows
371
+
372
+ After training pipeline:
373
+ - → **model-evaluation-workflow** for detailed analysis
374
+ - → **model-deployment-workflow** for production
375
+
376
+ ## Quality Gates
377
+
378
+ - [ ] All steps completed successfully
379
+ - [ ] Metrics meet defined thresholds
380
+ - [ ] Documentation updated
381
+ - [ ] Artifacts versioned and stored
382
+ - [ ] Stakeholder approval obtained