omgkit 2.20.0 → 2.21.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. package/README.md +125 -10
  2. package/package.json +1 -1
  3. package/plugin/agents/ai-architect-agent.md +282 -0
  4. package/plugin/agents/data-scientist-agent.md +221 -0
  5. package/plugin/agents/experiment-analyst-agent.md +318 -0
  6. package/plugin/agents/ml-engineer-agent.md +165 -0
  7. package/plugin/agents/mlops-engineer-agent.md +324 -0
  8. package/plugin/agents/model-optimizer-agent.md +287 -0
  9. package/plugin/agents/production-engineer-agent.md +360 -0
  10. package/plugin/agents/research-scientist-agent.md +274 -0
  11. package/plugin/commands/omgdata/augment.md +86 -0
  12. package/plugin/commands/omgdata/collect.md +81 -0
  13. package/plugin/commands/omgdata/label.md +83 -0
  14. package/plugin/commands/omgdata/split.md +83 -0
  15. package/plugin/commands/omgdata/validate.md +76 -0
  16. package/plugin/commands/omgdata/version.md +85 -0
  17. package/plugin/commands/omgdeploy/ab.md +94 -0
  18. package/plugin/commands/omgdeploy/cloud.md +89 -0
  19. package/plugin/commands/omgdeploy/edge.md +93 -0
  20. package/plugin/commands/omgdeploy/package.md +91 -0
  21. package/plugin/commands/omgdeploy/serve.md +92 -0
  22. package/plugin/commands/omgfeature/embed.md +93 -0
  23. package/plugin/commands/omgfeature/extract.md +93 -0
  24. package/plugin/commands/omgfeature/select.md +85 -0
  25. package/plugin/commands/omgfeature/store.md +97 -0
  26. package/plugin/commands/omgml/init.md +60 -0
  27. package/plugin/commands/omgml/status.md +82 -0
  28. package/plugin/commands/omgops/drift.md +87 -0
  29. package/plugin/commands/omgops/monitor.md +99 -0
  30. package/plugin/commands/omgops/pipeline.md +102 -0
  31. package/plugin/commands/omgops/registry.md +109 -0
  32. package/plugin/commands/omgops/retrain.md +91 -0
  33. package/plugin/commands/omgoptim/distill.md +90 -0
  34. package/plugin/commands/omgoptim/profile.md +92 -0
  35. package/plugin/commands/omgoptim/prune.md +81 -0
  36. package/plugin/commands/omgoptim/quantize.md +83 -0
  37. package/plugin/commands/omgtrain/baseline.md +78 -0
  38. package/plugin/commands/omgtrain/compare.md +99 -0
  39. package/plugin/commands/omgtrain/evaluate.md +85 -0
  40. package/plugin/commands/omgtrain/train.md +81 -0
  41. package/plugin/commands/omgtrain/tune.md +89 -0
  42. package/plugin/registry.yaml +252 -2
  43. package/plugin/skills/ml-systems/SKILL.md +65 -0
  44. package/plugin/skills/ml-systems/ai-accelerators/SKILL.md +342 -0
  45. package/plugin/skills/ml-systems/data-eng/SKILL.md +126 -0
  46. package/plugin/skills/ml-systems/deep-learning-primer/SKILL.md +143 -0
  47. package/plugin/skills/ml-systems/deployment-paradigms/SKILL.md +148 -0
  48. package/plugin/skills/ml-systems/dnn-architectures/SKILL.md +128 -0
  49. package/plugin/skills/ml-systems/edge-deployment/SKILL.md +366 -0
  50. package/plugin/skills/ml-systems/efficient-ai/SKILL.md +316 -0
  51. package/plugin/skills/ml-systems/feature-engineering/SKILL.md +151 -0
  52. package/plugin/skills/ml-systems/ml-frameworks/SKILL.md +187 -0
  53. package/plugin/skills/ml-systems/ml-serving-optimization/SKILL.md +371 -0
  54. package/plugin/skills/ml-systems/ml-systems-fundamentals/SKILL.md +103 -0
  55. package/plugin/skills/ml-systems/ml-workflow/SKILL.md +162 -0
  56. package/plugin/skills/ml-systems/mlops/SKILL.md +386 -0
  57. package/plugin/skills/ml-systems/model-deployment/SKILL.md +350 -0
  58. package/plugin/skills/ml-systems/model-dev/SKILL.md +160 -0
  59. package/plugin/skills/ml-systems/model-optimization/SKILL.md +339 -0
  60. package/plugin/skills/ml-systems/robust-ai/SKILL.md +395 -0
  61. package/plugin/skills/ml-systems/training-data/SKILL.md +152 -0
  62. package/plugin/workflows/ml-systems/data-preparation-workflow.md +276 -0
  63. package/plugin/workflows/ml-systems/edge-deployment-workflow.md +413 -0
  64. package/plugin/workflows/ml-systems/full-ml-lifecycle-workflow.md +405 -0
  65. package/plugin/workflows/ml-systems/hyperparameter-tuning-workflow.md +352 -0
  66. package/plugin/workflows/ml-systems/mlops-pipeline-workflow.md +384 -0
  67. package/plugin/workflows/ml-systems/model-deployment-workflow.md +392 -0
  68. package/plugin/workflows/ml-systems/model-development-workflow.md +218 -0
  69. package/plugin/workflows/ml-systems/model-evaluation-workflow.md +416 -0
  70. package/plugin/workflows/ml-systems/model-optimization-workflow.md +390 -0
  71. package/plugin/workflows/ml-systems/monitoring-drift-workflow.md +446 -0
  72. package/plugin/workflows/ml-systems/retraining-workflow.md +401 -0
  73. package/plugin/workflows/ml-systems/training-pipeline-workflow.md +382 -0
@@ -0,0 +1,352 @@
1
+ ---
2
+ name: Hyperparameter Tuning Workflow
3
+ description: Systematic hyperparameter optimization workflow using grid search, random search, Bayesian optimization, and advanced techniques.
4
+ category: ml-systems
5
+ complexity: medium
6
+ agents:
7
+ - research-scientist-agent
8
+ - experiment-analyst-agent
9
+ ---
10
+
11
+ # Hyperparameter Tuning Workflow
12
+
13
+ Systematic optimization of model hyperparameters.
14
+
15
+ ## Overview
16
+
17
+ ```
18
+ ┌─────────────────────────────────────────────────────────────┐
19
+ │ HYPERPARAMETER TUNING WORKFLOW │
20
+ ├─────────────────────────────────────────────────────────────┤
21
+ │ │
22
+ │ 1. DEFINE 2. STRATEGY 3. SEARCH │
23
+ │ SPACE SELECTION EXECUTION │
24
+ │ ↓ ↓ ↓ │
25
+ │ Parameter ranges Grid/Random Run trials │
26
+ │ Constraints Bayesian/Evol Early stopping │
27
+ │ Prior knowledge Resource budget Parallelization │
28
+ │ │
29
+ │ 4. ANALYSIS 5. VALIDATION 6. DOCUMENTATION │
30
+ │ ↓ ↓ ↓ │
31
+ │ Best params Cross-validate Save config │
32
+ │ Importance Holdout test Log results │
33
+ │ Sensitivity Stability check Best practices │
34
+ │ │
35
+ └─────────────────────────────────────────────────────────────┘
36
+ ```
37
+
38
+ ## Steps
39
+
40
+ ### Step 1: Define Search Space
41
+ **Agent**: research-scientist-agent
42
+
43
+ **Inputs**:
44
+ - Model type
45
+ - Domain knowledge
46
+ - Computational budget
47
+
48
+ **Actions**:
49
+ ```python
50
+ # Define search space
51
+ search_space = {
52
+ # Continuous parameters (log scale for learning rate)
53
+ "learning_rate": {
54
+ "type": "float",
55
+ "low": 1e-5,
56
+ "high": 1e-1,
57
+ "log": True
58
+ },
59
+
60
+ # Integer parameters
61
+ "max_depth": {
62
+ "type": "int",
63
+ "low": 3,
64
+ "high": 15
65
+ },
66
+
67
+ "n_estimators": {
68
+ "type": "int",
69
+ "low": 50,
70
+ "high": 500
71
+ },
72
+
73
+ # Categorical parameters
74
+ "booster": {
75
+ "type": "categorical",
76
+ "choices": ["gbtree", "gblinear", "dart"]
77
+ },
78
+
79
+ # Conditional parameters
80
+ "subsample": {
81
+ "type": "float",
82
+ "low": 0.5,
83
+ "high": 1.0,
84
+ "condition": "booster == 'gbtree'"
85
+ }
86
+ }
87
+ ```
88
+
89
+ **Outputs**:
90
+ - Search space definition
91
+ - Parameter constraints
92
+ - Prior distributions
93
+
94
+ ### Step 2: Strategy Selection
95
+ **Agent**: research-scientist-agent
96
+
97
+ **Inputs**:
98
+ - Search space
99
+ - Computational budget
100
+ - Time constraints
101
+
102
+ **Strategy Selection**:
103
+ ```python
104
+ def select_tuning_strategy(search_space, budget, time_hours):
105
+ n_params = len(search_space)
106
+ n_combinations = estimate_combinations(search_space)
107
+
108
+ if n_combinations < 100:
109
+ return "grid_search"
110
+ elif budget < 50:
111
+ return "random_search"
112
+ elif n_params < 10:
113
+ return "bayesian_tpe"
114
+ elif time_hours < 2:
115
+ return "hyperband"
116
+ else:
117
+ return "optuna_multiobj"
118
+
119
+ strategies = {
120
+ "grid_search": "Exhaustive, best for small spaces",
121
+ "random_search": "Good baseline, parallelizable",
122
+ "bayesian_tpe": "Efficient, learns from trials",
123
+ "hyperband": "Fast, early stopping",
124
+ "optuna_multiobj": "Multi-objective optimization"
125
+ }
126
+ ```
127
+
128
+ **Outputs**:
129
+ - Selected strategy
130
+ - Resource allocation
131
+ - Parallelization plan
132
+
133
+ ### Step 3: Search Execution
134
+ **Agent**: research-scientist-agent
135
+
136
+ **Actions**:
137
+ ```bash
138
+ # Run hyperparameter tuning
139
+ /omgtrain:tune --model xgboost --space space.yaml --trials 100 --strategy bayesian
140
+ ```
141
+
142
+ **Optuna Implementation**:
143
+ ```python
144
+ import optuna
145
+ from optuna.pruners import MedianPruner
146
+ from optuna.samplers import TPESampler
147
+
148
+ def objective(trial):
149
+ params = {
150
+ "learning_rate": trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True),
151
+ "max_depth": trial.suggest_int("max_depth", 3, 15),
152
+ "n_estimators": trial.suggest_int("n_estimators", 50, 500),
153
+ "subsample": trial.suggest_float("subsample", 0.5, 1.0),
154
+ "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),
155
+ "reg_alpha": trial.suggest_float("reg_alpha", 1e-8, 10.0, log=True),
156
+ "reg_lambda": trial.suggest_float("reg_lambda", 1e-8, 10.0, log=True),
157
+ }
158
+
159
+ model = XGBClassifier(**params, use_label_encoder=False, eval_metric="logloss")
160
+
161
+ # Cross-validation with early stopping
162
+ scores = []
163
+ for fold, (train_idx, val_idx) in enumerate(kfold.split(X, y)):
164
+ X_train, X_val = X[train_idx], X[val_idx]
165
+ y_train, y_val = y[train_idx], y[val_idx]
166
+
167
+ model.fit(
168
+ X_train, y_train,
169
+ eval_set=[(X_val, y_val)],
170
+ early_stopping_rounds=50,
171
+ verbose=False
172
+ )
173
+
174
+ score = roc_auc_score(y_val, model.predict_proba(X_val)[:, 1])
175
+ scores.append(score)
176
+
177
+ # Report for pruning
178
+ trial.report(np.mean(scores), fold)
179
+ if trial.should_prune():
180
+ raise optuna.TrialPruned()
181
+
182
+ return np.mean(scores)
183
+
184
+ # Create study
185
+ study = optuna.create_study(
186
+ direction="maximize",
187
+ sampler=TPESampler(seed=42),
188
+ pruner=MedianPruner(n_startup_trials=10, n_warmup_steps=3)
189
+ )
190
+
191
+ # Optimize
192
+ study.optimize(objective, n_trials=100, n_jobs=4, show_progress_bar=True)
193
+ ```
194
+
195
+ **Outputs**:
196
+ - Trial results
197
+ - Best parameters
198
+ - Optimization history
199
+
200
+ ### Step 4: Analysis
201
+ **Agent**: experiment-analyst-agent
202
+
203
+ **Inputs**:
204
+ - All trial results
205
+ - Best parameters
206
+ - Optimization history
207
+
208
+ **Actions**:
209
+ ```python
210
+ # Analyze tuning results
211
+ def analyze_tuning_results(study):
212
+ # Best parameters
213
+ print(f"Best trial: {study.best_trial.number}")
214
+ print(f"Best value: {study.best_value:.4f}")
215
+ print(f"Best params: {study.best_params}")
216
+
217
+ # Parameter importance
218
+ importance = optuna.importance.get_param_importances(study)
219
+ print("\nParameter Importance:")
220
+ for param, imp in sorted(importance.items(), key=lambda x: -x[1]):
221
+ print(f" {param}: {imp:.4f}")
222
+
223
+ # Visualization
224
+ fig1 = optuna.visualization.plot_optimization_history(study)
225
+ fig2 = optuna.visualization.plot_param_importances(study)
226
+ fig3 = optuna.visualization.plot_parallel_coordinate(study)
227
+ fig4 = optuna.visualization.plot_contour(study, params=["learning_rate", "max_depth"])
228
+
229
+ return {
230
+ "best_params": study.best_params,
231
+ "best_value": study.best_value,
232
+ "importance": importance,
233
+ "n_trials": len(study.trials),
234
+ "n_pruned": len([t for t in study.trials if t.state == optuna.trial.TrialState.PRUNED])
235
+ }
236
+ ```
237
+
238
+ **Outputs**:
239
+ - Parameter importance
240
+ - Optimization curves
241
+ - Sensitivity analysis
242
+
243
+ ### Step 5: Validation
244
+ **Agent**: experiment-analyst-agent
245
+
246
+ **Inputs**:
247
+ - Best parameters
248
+ - Validation strategy
249
+ - Stability requirements
250
+
251
+ **Actions**:
252
+ ```python
253
+ # Validate best parameters
254
+ def validate_best_params(best_params, X, y, n_seeds=5):
255
+ results = []
256
+
257
+ for seed in range(n_seeds):
258
+ # Train with different seeds
259
+ model = XGBClassifier(**best_params, random_state=seed)
260
+
261
+ # Full cross-validation
262
+ scores = cross_val_score(model, X, y, cv=5, scoring='roc_auc')
263
+
264
+ results.append({
265
+ "seed": seed,
266
+ "mean_cv": scores.mean(),
267
+ "std_cv": scores.std(),
268
+ "scores": scores.tolist()
269
+ })
270
+
271
+ # Aggregate results
272
+ all_means = [r["mean_cv"] for r in results]
273
+
274
+ return {
275
+ "overall_mean": np.mean(all_means),
276
+ "overall_std": np.std(all_means),
277
+ "min": np.min(all_means),
278
+ "max": np.max(all_means),
279
+ "stable": np.std(all_means) < 0.02, # <2% variance
280
+ "individual_results": results
281
+ }
282
+ ```
283
+
284
+ **Outputs**:
285
+ - Stability metrics
286
+ - Confidence intervals
287
+ - Final recommendation
288
+
289
+ ### Step 6: Documentation
290
+ **Agent**: research-scientist-agent
291
+
292
+ **Inputs**:
293
+ - All tuning results
294
+ - Best configuration
295
+ - Analysis
296
+
297
+ **Actions**:
298
+ ```python
299
+ # Generate tuning report
300
+ tuning_report = {
301
+ "metadata": {
302
+ "model_type": "XGBClassifier",
303
+ "date": datetime.now().isoformat(),
304
+ "n_trials": 100,
305
+ "strategy": "TPE + MedianPruner"
306
+ },
307
+ "search_space": search_space,
308
+ "best_params": study.best_params,
309
+ "performance": {
310
+ "best_cv_score": study.best_value,
311
+ "validation_score": validation_results["overall_mean"],
312
+ "stability": validation_results["overall_std"]
313
+ },
314
+ "parameter_importance": importance,
315
+ "recommendations": [
316
+ "Use learning_rate=0.05 (most important parameter)",
317
+ "max_depth=8 is optimal, avoid >12",
318
+ "n_estimators saturates around 300"
319
+ ]
320
+ }
321
+
322
+ # Save configuration
323
+ with open("best_config.yaml", "w") as f:
324
+ yaml.dump({"hyperparameters": study.best_params}, f)
325
+ ```
326
+
327
+ **Outputs**:
328
+ - Tuning report
329
+ - Best configuration file
330
+ - MLflow artifacts
331
+
332
+ ## Artifacts
333
+
334
+ - `search_space.yaml` - Parameter definitions
335
+ - `best_config.yaml` - Optimal hyperparameters
336
+ - `tuning_report.json` - Complete analysis
337
+ - `optuna.db` - Study database
338
+ - `visualizations/` - Plots and charts
339
+
340
+ ## Next Workflows
341
+
342
+ After hyperparameter tuning:
343
+ - → **training-pipeline-workflow** with best params
344
+ - → **model-evaluation-workflow** for validation
345
+
346
+ ## Quality Gates
347
+
348
+ - [ ] All steps completed successfully
349
+ - [ ] Metrics meet defined thresholds
350
+ - [ ] Documentation updated
351
+ - [ ] Artifacts versioned and stored
352
+ - [ ] Stakeholder approval obtained