omgkit 2.20.0 → 2.21.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +125 -10
- package/package.json +1 -1
- package/plugin/agents/ai-architect-agent.md +282 -0
- package/plugin/agents/data-scientist-agent.md +221 -0
- package/plugin/agents/experiment-analyst-agent.md +318 -0
- package/plugin/agents/ml-engineer-agent.md +165 -0
- package/plugin/agents/mlops-engineer-agent.md +324 -0
- package/plugin/agents/model-optimizer-agent.md +287 -0
- package/plugin/agents/production-engineer-agent.md +360 -0
- package/plugin/agents/research-scientist-agent.md +274 -0
- package/plugin/commands/omgdata/augment.md +86 -0
- package/plugin/commands/omgdata/collect.md +81 -0
- package/plugin/commands/omgdata/label.md +83 -0
- package/plugin/commands/omgdata/split.md +83 -0
- package/plugin/commands/omgdata/validate.md +76 -0
- package/plugin/commands/omgdata/version.md +85 -0
- package/plugin/commands/omgdeploy/ab.md +94 -0
- package/plugin/commands/omgdeploy/cloud.md +89 -0
- package/plugin/commands/omgdeploy/edge.md +93 -0
- package/plugin/commands/omgdeploy/package.md +91 -0
- package/plugin/commands/omgdeploy/serve.md +92 -0
- package/plugin/commands/omgfeature/embed.md +93 -0
- package/plugin/commands/omgfeature/extract.md +93 -0
- package/plugin/commands/omgfeature/select.md +85 -0
- package/plugin/commands/omgfeature/store.md +97 -0
- package/plugin/commands/omgml/init.md +60 -0
- package/plugin/commands/omgml/status.md +82 -0
- package/plugin/commands/omgops/drift.md +87 -0
- package/plugin/commands/omgops/monitor.md +99 -0
- package/plugin/commands/omgops/pipeline.md +102 -0
- package/plugin/commands/omgops/registry.md +109 -0
- package/plugin/commands/omgops/retrain.md +91 -0
- package/plugin/commands/omgoptim/distill.md +90 -0
- package/plugin/commands/omgoptim/profile.md +92 -0
- package/plugin/commands/omgoptim/prune.md +81 -0
- package/plugin/commands/omgoptim/quantize.md +83 -0
- package/plugin/commands/omgtrain/baseline.md +78 -0
- package/plugin/commands/omgtrain/compare.md +99 -0
- package/plugin/commands/omgtrain/evaluate.md +85 -0
- package/plugin/commands/omgtrain/train.md +81 -0
- package/plugin/commands/omgtrain/tune.md +89 -0
- package/plugin/registry.yaml +252 -2
- package/plugin/skills/ml-systems/SKILL.md +65 -0
- package/plugin/skills/ml-systems/ai-accelerators/SKILL.md +342 -0
- package/plugin/skills/ml-systems/data-eng/SKILL.md +126 -0
- package/plugin/skills/ml-systems/deep-learning-primer/SKILL.md +143 -0
- package/plugin/skills/ml-systems/deployment-paradigms/SKILL.md +148 -0
- package/plugin/skills/ml-systems/dnn-architectures/SKILL.md +128 -0
- package/plugin/skills/ml-systems/edge-deployment/SKILL.md +366 -0
- package/plugin/skills/ml-systems/efficient-ai/SKILL.md +316 -0
- package/plugin/skills/ml-systems/feature-engineering/SKILL.md +151 -0
- package/plugin/skills/ml-systems/ml-frameworks/SKILL.md +187 -0
- package/plugin/skills/ml-systems/ml-serving-optimization/SKILL.md +371 -0
- package/plugin/skills/ml-systems/ml-systems-fundamentals/SKILL.md +103 -0
- package/plugin/skills/ml-systems/ml-workflow/SKILL.md +162 -0
- package/plugin/skills/ml-systems/mlops/SKILL.md +386 -0
- package/plugin/skills/ml-systems/model-deployment/SKILL.md +350 -0
- package/plugin/skills/ml-systems/model-dev/SKILL.md +160 -0
- package/plugin/skills/ml-systems/model-optimization/SKILL.md +339 -0
- package/plugin/skills/ml-systems/robust-ai/SKILL.md +395 -0
- package/plugin/skills/ml-systems/training-data/SKILL.md +152 -0
- package/plugin/workflows/ml-systems/data-preparation-workflow.md +276 -0
- package/plugin/workflows/ml-systems/edge-deployment-workflow.md +413 -0
- package/plugin/workflows/ml-systems/full-ml-lifecycle-workflow.md +405 -0
- package/plugin/workflows/ml-systems/hyperparameter-tuning-workflow.md +352 -0
- package/plugin/workflows/ml-systems/mlops-pipeline-workflow.md +384 -0
- package/plugin/workflows/ml-systems/model-deployment-workflow.md +392 -0
- package/plugin/workflows/ml-systems/model-development-workflow.md +218 -0
- package/plugin/workflows/ml-systems/model-evaluation-workflow.md +416 -0
- package/plugin/workflows/ml-systems/model-optimization-workflow.md +390 -0
- package/plugin/workflows/ml-systems/monitoring-drift-workflow.md +446 -0
- package/plugin/workflows/ml-systems/retraining-workflow.md +401 -0
- package/plugin/workflows/ml-systems/training-pipeline-workflow.md +382 -0
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Profile model performance including latency, memory usage, compute requirements, and bottlenecks
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <model_path> [--input_shape <shape>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Model Profiling: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Profile model: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **performance-engineer-agent** for comprehensive profiling.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **model_path**: Path to model
|
|
16
|
+
- **input_shape**: Input tensor shape for inference
|
|
17
|
+
|
|
18
|
+
## Profiling Dimensions
|
|
19
|
+
|
|
20
|
+
### Latency
|
|
21
|
+
- Cold start time
|
|
22
|
+
- Warm inference time
|
|
23
|
+
- P50/P90/P99 latencies
|
|
24
|
+
- Batch size scaling
|
|
25
|
+
|
|
26
|
+
### Memory
|
|
27
|
+
- Peak memory usage
|
|
28
|
+
- Memory per layer
|
|
29
|
+
- Activation memory
|
|
30
|
+
- Gradient memory (training)
|
|
31
|
+
|
|
32
|
+
### Compute
|
|
33
|
+
- FLOPs count
|
|
34
|
+
- MACs (multiply-accumulate)
|
|
35
|
+
- Parameter count
|
|
36
|
+
- Layer-wise breakdown
|
|
37
|
+
|
|
38
|
+
### Bottlenecks
|
|
39
|
+
- Slowest layers
|
|
40
|
+
- Memory-bound vs compute-bound
|
|
41
|
+
- Data loading overhead
|
|
42
|
+
- GPU utilization
|
|
43
|
+
|
|
44
|
+
## Code Template
|
|
45
|
+
```python
|
|
46
|
+
from omgkit.optimization import ModelProfiler
|
|
47
|
+
|
|
48
|
+
profiler = ModelProfiler()
|
|
49
|
+
|
|
50
|
+
# Comprehensive profiling
|
|
51
|
+
profile = profiler.profile(
|
|
52
|
+
model_path="models/best_model.pt",
|
|
53
|
+
input_shape=(1, 3, 224, 224),
|
|
54
|
+
device="cuda",
|
|
55
|
+
warmup_iterations=10,
|
|
56
|
+
profile_iterations=100
|
|
57
|
+
)
|
|
58
|
+
|
|
59
|
+
# Print summary
|
|
60
|
+
print(f"Latency (P50): {profile.latency_p50:.2f}ms")
|
|
61
|
+
print(f"Latency (P99): {profile.latency_p99:.2f}ms")
|
|
62
|
+
print(f"Peak Memory: {profile.peak_memory_mb:.1f}MB")
|
|
63
|
+
print(f"FLOPs: {profile.flops / 1e9:.2f}G")
|
|
64
|
+
print(f"Parameters: {profile.params / 1e6:.2f}M")
|
|
65
|
+
|
|
66
|
+
# Detailed breakdown
|
|
67
|
+
profiler.layer_breakdown(profile)
|
|
68
|
+
|
|
69
|
+
# Generate report
|
|
70
|
+
profiler.report(profile, output="reports/profiling_report.html")
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Hardware Support
|
|
74
|
+
- CPU profiling
|
|
75
|
+
- CUDA profiling
|
|
76
|
+
- TensorRT profiling
|
|
77
|
+
- ONNX Runtime profiling
|
|
78
|
+
|
|
79
|
+
## Optimization Recommendations
|
|
80
|
+
- Identified bottlenecks
|
|
81
|
+
- Suggested optimizations
|
|
82
|
+
- Target hardware considerations
|
|
83
|
+
- Trade-off analysis
|
|
84
|
+
|
|
85
|
+
## Progress
|
|
86
|
+
- [ ] Model loaded
|
|
87
|
+
- [ ] Warmup complete
|
|
88
|
+
- [ ] Profiling executed
|
|
89
|
+
- [ ] Analysis complete
|
|
90
|
+
- [ ] Report generated
|
|
91
|
+
|
|
92
|
+
Identify performance bottlenecks and optimization opportunities.
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Prune model weights using magnitude, structured, or lottery ticket methods to reduce redundancy
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <method> [--sparsity <sparsity>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Model Pruning: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Prune model: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **performance-engineer-agent** for model pruning.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **method**: magnitude | structured | lottery_ticket (default: magnitude)
|
|
16
|
+
- **sparsity**: Target sparsity level (default: 0.5 = 50% removed)
|
|
17
|
+
|
|
18
|
+
## Pruning Methods
|
|
19
|
+
|
|
20
|
+
### Magnitude Pruning
|
|
21
|
+
- Remove smallest weights
|
|
22
|
+
- Unstructured sparsity
|
|
23
|
+
- Most flexible
|
|
24
|
+
- Requires sparse support
|
|
25
|
+
|
|
26
|
+
### Structured Pruning
|
|
27
|
+
- Remove entire channels/filters
|
|
28
|
+
- Hardware-friendly
|
|
29
|
+
- Immediate speedup
|
|
30
|
+
- More accuracy loss
|
|
31
|
+
|
|
32
|
+
### Lottery Ticket
|
|
33
|
+
- Find sparse subnetwork
|
|
34
|
+
- Train from scratch
|
|
35
|
+
- Best theoretical results
|
|
36
|
+
- Most compute-intensive
|
|
37
|
+
|
|
38
|
+
## Code Template
|
|
39
|
+
```python
|
|
40
|
+
from omgkit.optimization import ModelPruner
|
|
41
|
+
|
|
42
|
+
pruner = ModelPruner()
|
|
43
|
+
|
|
44
|
+
# Iterative magnitude pruning
|
|
45
|
+
pruned_model = pruner.prune(
|
|
46
|
+
model_path="models/best_model.pt",
|
|
47
|
+
method="magnitude",
|
|
48
|
+
target_sparsity=0.7,
|
|
49
|
+
iterative_steps=5,
|
|
50
|
+
finetune_epochs=10,
|
|
51
|
+
finetune_data="data/splits/train.parquet"
|
|
52
|
+
)
|
|
53
|
+
|
|
54
|
+
# Report
|
|
55
|
+
pruner.report(
|
|
56
|
+
original_model="models/best_model.pt",
|
|
57
|
+
pruned_model=pruned_model,
|
|
58
|
+
output="reports/pruning_report.html"
|
|
59
|
+
)
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Sparsity Levels
|
|
63
|
+
- 50%: Safe, minimal accuracy loss
|
|
64
|
+
- 70%: Moderate, some accuracy drop
|
|
65
|
+
- 90%: Aggressive, significant drop
|
|
66
|
+
- 95%+: Extreme, research only
|
|
67
|
+
|
|
68
|
+
## Best Practices
|
|
69
|
+
- Iterative pruning (gradual)
|
|
70
|
+
- Fine-tune after pruning
|
|
71
|
+
- Validate on holdout set
|
|
72
|
+
- Consider structured for deployment
|
|
73
|
+
|
|
74
|
+
## Progress
|
|
75
|
+
- [ ] Model analyzed
|
|
76
|
+
- [ ] Pruning applied
|
|
77
|
+
- [ ] Fine-tuning complete
|
|
78
|
+
- [ ] Quality validated
|
|
79
|
+
- [ ] Report generated
|
|
80
|
+
|
|
81
|
+
Remove redundant weights while maintaining performance.
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Quantize model using dynamic, static, or QAT methods to reduce size and improve inference speed
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <method> [--calibration_data <path>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Model Quantization: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Quantize model: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **performance-engineer-agent** for model quantization.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **method**: dynamic | static | qat (default: dynamic)
|
|
16
|
+
- **calibration_data**: Path to calibration data (for static)
|
|
17
|
+
|
|
18
|
+
## Quantization Methods
|
|
19
|
+
|
|
20
|
+
### Dynamic Quantization
|
|
21
|
+
- Quantize weights only
|
|
22
|
+
- Activations at runtime
|
|
23
|
+
- No calibration needed
|
|
24
|
+
- Quick to apply
|
|
25
|
+
|
|
26
|
+
### Static Quantization
|
|
27
|
+
- Quantize weights and activations
|
|
28
|
+
- Requires calibration data
|
|
29
|
+
- Better performance
|
|
30
|
+
- More accurate
|
|
31
|
+
|
|
32
|
+
### QAT (Quantization-Aware Training)
|
|
33
|
+
- Train with quantization simulation
|
|
34
|
+
- Best accuracy preservation
|
|
35
|
+
- Requires retraining
|
|
36
|
+
- Most effort
|
|
37
|
+
|
|
38
|
+
## Code Template
|
|
39
|
+
```python
|
|
40
|
+
from omgkit.optimization import ModelQuantizer
|
|
41
|
+
|
|
42
|
+
quantizer = ModelQuantizer()
|
|
43
|
+
|
|
44
|
+
# Static quantization
|
|
45
|
+
quantized_model = quantizer.quantize(
|
|
46
|
+
model_path="models/best_model.pt",
|
|
47
|
+
method="static",
|
|
48
|
+
calibration_data="data/splits/val.parquet",
|
|
49
|
+
calibration_samples=1000,
|
|
50
|
+
dtype="int8"
|
|
51
|
+
)
|
|
52
|
+
|
|
53
|
+
# Evaluate quantized model
|
|
54
|
+
metrics = quantizer.evaluate(
|
|
55
|
+
original_model="models/best_model.pt",
|
|
56
|
+
quantized_model=quantized_model,
|
|
57
|
+
test_data="data/splits/test.parquet"
|
|
58
|
+
)
|
|
59
|
+
|
|
60
|
+
print(f"Size reduction: {metrics['size_reduction']:.1%}")
|
|
61
|
+
print(f"Speedup: {metrics['speedup']:.2f}x")
|
|
62
|
+
print(f"Accuracy drop: {metrics['accuracy_drop']:.2%}")
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Output Formats
|
|
66
|
+
- INT8: 4x smaller, ~2-4x faster
|
|
67
|
+
- INT4: 8x smaller, experimental
|
|
68
|
+
- FP16: 2x smaller, minimal accuracy loss
|
|
69
|
+
|
|
70
|
+
## Comparison Metrics
|
|
71
|
+
- Model size reduction
|
|
72
|
+
- Inference speedup
|
|
73
|
+
- Memory usage reduction
|
|
74
|
+
- Accuracy degradation
|
|
75
|
+
|
|
76
|
+
## Progress
|
|
77
|
+
- [ ] Model loaded
|
|
78
|
+
- [ ] Calibration complete
|
|
79
|
+
- [ ] Quantization applied
|
|
80
|
+
- [ ] Quality validated
|
|
81
|
+
- [ ] Output saved
|
|
82
|
+
|
|
83
|
+
Optimize model for production deployment.
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Train baseline models for comparison including logistic regression, random forest, and XGBoost
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <task_type> [--models <models>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Baseline Training: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Train baselines: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **model-engineer-agent** for baseline model training.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **task_type**: classification | regression | detection | segmentation | nlp
|
|
16
|
+
- **models**: List of baseline models to train
|
|
17
|
+
|
|
18
|
+
## Default Baselines by Task
|
|
19
|
+
|
|
20
|
+
### Classification
|
|
21
|
+
- Logistic Regression
|
|
22
|
+
- Random Forest
|
|
23
|
+
- XGBoost
|
|
24
|
+
- LightGBM
|
|
25
|
+
- Naive Bayes
|
|
26
|
+
|
|
27
|
+
### Regression
|
|
28
|
+
- Linear Regression
|
|
29
|
+
- Random Forest Regressor
|
|
30
|
+
- XGBoost Regressor
|
|
31
|
+
- LightGBM Regressor
|
|
32
|
+
|
|
33
|
+
### NLP
|
|
34
|
+
- TF-IDF + SVM
|
|
35
|
+
- FastText
|
|
36
|
+
- DistilBERT
|
|
37
|
+
|
|
38
|
+
## Code Template
|
|
39
|
+
```python
|
|
40
|
+
from omgkit.training import BaselineTrainer
|
|
41
|
+
|
|
42
|
+
trainer = BaselineTrainer(task_type="classification")
|
|
43
|
+
|
|
44
|
+
# Train multiple baselines
|
|
45
|
+
results = trainer.train_baselines(
|
|
46
|
+
data_path="data/splits/",
|
|
47
|
+
models=["logistic_regression", "random_forest", "xgboost"],
|
|
48
|
+
target_column="label",
|
|
49
|
+
metrics=["accuracy", "f1", "roc_auc"]
|
|
50
|
+
)
|
|
51
|
+
|
|
52
|
+
# Compare and select best
|
|
53
|
+
best_model = trainer.select_best(metric="f1")
|
|
54
|
+
|
|
55
|
+
# Save report
|
|
56
|
+
trainer.report(output="reports/baseline_comparison.html")
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Output
|
|
60
|
+
- Trained models
|
|
61
|
+
- Comparison metrics table
|
|
62
|
+
- Best baseline selection
|
|
63
|
+
- Confusion matrices
|
|
64
|
+
- Feature importance (if applicable)
|
|
65
|
+
|
|
66
|
+
## Comparison Metrics
|
|
67
|
+
- Primary: Task-specific (accuracy, F1, RMSE)
|
|
68
|
+
- Secondary: Training time, inference time
|
|
69
|
+
- Tertiary: Model size, memory usage
|
|
70
|
+
|
|
71
|
+
## Progress
|
|
72
|
+
- [ ] Data loaded
|
|
73
|
+
- [ ] Models trained
|
|
74
|
+
- [ ] Metrics computed
|
|
75
|
+
- [ ] Best selected
|
|
76
|
+
- [ ] Report generated
|
|
77
|
+
|
|
78
|
+
Establish performance baseline before complex models.
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Compare multiple experiments and models across metrics, visualize differences, and select best
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <experiments> [--metric <metric>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Experiment Comparison: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Compare experiments: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **model-engineer-agent** for experiment comparison.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **experiments**: List of experiment names or run IDs
|
|
16
|
+
- **metric**: Primary comparison metric
|
|
17
|
+
|
|
18
|
+
## Comparison Dimensions
|
|
19
|
+
|
|
20
|
+
### Metrics
|
|
21
|
+
- Primary performance metric
|
|
22
|
+
- Secondary metrics
|
|
23
|
+
- Training metrics (loss curves)
|
|
24
|
+
- Validation metrics
|
|
25
|
+
|
|
26
|
+
### Resources
|
|
27
|
+
- Training time
|
|
28
|
+
- GPU memory usage
|
|
29
|
+
- Inference latency
|
|
30
|
+
- Model size
|
|
31
|
+
|
|
32
|
+
### Parameters
|
|
33
|
+
- Hyperparameter differences
|
|
34
|
+
- Architecture variations
|
|
35
|
+
- Data preprocessing
|
|
36
|
+
|
|
37
|
+
## Code Template
|
|
38
|
+
```python
|
|
39
|
+
from omgkit.training import ExperimentComparer
|
|
40
|
+
import mlflow
|
|
41
|
+
|
|
42
|
+
comparer = ExperimentComparer(tracking_uri="http://mlflow.example.com")
|
|
43
|
+
|
|
44
|
+
comparison = comparer.compare(
|
|
45
|
+
experiments=[
|
|
46
|
+
"churn_v1",
|
|
47
|
+
"churn_v2_deeper",
|
|
48
|
+
"churn_v3_ensemble"
|
|
49
|
+
],
|
|
50
|
+
metrics=["accuracy", "f1", "roc_auc", "latency"],
|
|
51
|
+
primary_metric="f1"
|
|
52
|
+
)
|
|
53
|
+
|
|
54
|
+
# Visualize comparison
|
|
55
|
+
comparer.plot_comparison(
|
|
56
|
+
comparison,
|
|
57
|
+
output="reports/experiment_comparison.html"
|
|
58
|
+
)
|
|
59
|
+
|
|
60
|
+
# Statistical significance test
|
|
61
|
+
significance = comparer.significance_test(
|
|
62
|
+
experiment_a="churn_v2_deeper",
|
|
63
|
+
experiment_b="churn_v3_ensemble",
|
|
64
|
+
metric="f1"
|
|
65
|
+
)
|
|
66
|
+
|
|
67
|
+
# Select best model
|
|
68
|
+
best = comparer.select_best(
|
|
69
|
+
comparison,
|
|
70
|
+
metric="f1",
|
|
71
|
+
constraints={"latency_ms": 50}
|
|
72
|
+
)
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## Visualizations
|
|
76
|
+
- Parallel coordinates
|
|
77
|
+
- Metric bar charts
|
|
78
|
+
- Learning curve overlay
|
|
79
|
+
- Hyperparameter importance
|
|
80
|
+
|
|
81
|
+
## Statistical Tests
|
|
82
|
+
- t-test for significance
|
|
83
|
+
- Bootstrap confidence intervals
|
|
84
|
+
- McNemar's test (classification)
|
|
85
|
+
|
|
86
|
+
## Output
|
|
87
|
+
- Comparison table
|
|
88
|
+
- Winner recommendation
|
|
89
|
+
- Trade-off analysis
|
|
90
|
+
- Visual report
|
|
91
|
+
|
|
92
|
+
## Progress
|
|
93
|
+
- [ ] Experiments loaded
|
|
94
|
+
- [ ] Metrics extracted
|
|
95
|
+
- [ ] Comparison computed
|
|
96
|
+
- [ ] Significance tested
|
|
97
|
+
- [ ] Report generated
|
|
98
|
+
|
|
99
|
+
Make data-driven model selection decisions.
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Evaluate model comprehensively including performance, robustness, fairness, and efficiency metrics
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <model_path> [--test_data <path>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Model Evaluation: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Evaluate model: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **model-engineer-agent** for comprehensive evaluation.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **model_path**: Path to trained model
|
|
16
|
+
- **test_data**: Path to test dataset
|
|
17
|
+
|
|
18
|
+
## Evaluation Types
|
|
19
|
+
|
|
20
|
+
### Performance Metrics
|
|
21
|
+
- Accuracy, Precision, Recall, F1
|
|
22
|
+
- ROC-AUC, PR-AUC
|
|
23
|
+
- Confusion matrix
|
|
24
|
+
- Classification report
|
|
25
|
+
|
|
26
|
+
### Robustness Testing
|
|
27
|
+
- Perturbation tests
|
|
28
|
+
- Out-of-distribution detection
|
|
29
|
+
- Adversarial evaluation
|
|
30
|
+
- Noise sensitivity
|
|
31
|
+
|
|
32
|
+
### Fairness Analysis
|
|
33
|
+
- Demographic parity
|
|
34
|
+
- Equal opportunity
|
|
35
|
+
- Calibration by group
|
|
36
|
+
- Bias detection
|
|
37
|
+
|
|
38
|
+
### Efficiency Metrics
|
|
39
|
+
- Inference latency
|
|
40
|
+
- Memory usage
|
|
41
|
+
- Throughput
|
|
42
|
+
- Model size
|
|
43
|
+
|
|
44
|
+
## Code Template
|
|
45
|
+
```python
|
|
46
|
+
from omgkit.training import ModelEvaluator
|
|
47
|
+
|
|
48
|
+
evaluator = ModelEvaluator()
|
|
49
|
+
|
|
50
|
+
results = evaluator.evaluate(
|
|
51
|
+
model_path="models/best_model.pt",
|
|
52
|
+
test_data="data/splits/test.parquet",
|
|
53
|
+
evaluation_types=[
|
|
54
|
+
"performance",
|
|
55
|
+
"robustness",
|
|
56
|
+
"fairness",
|
|
57
|
+
"efficiency"
|
|
58
|
+
],
|
|
59
|
+
fairness_groups=["gender", "age_group"],
|
|
60
|
+
perturbation_levels=[0.01, 0.05, 0.1]
|
|
61
|
+
)
|
|
62
|
+
|
|
63
|
+
evaluator.report(
|
|
64
|
+
results=results,
|
|
65
|
+
output="reports/evaluation_report.html",
|
|
66
|
+
include_plots=True
|
|
67
|
+
)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Report Includes
|
|
71
|
+
- Summary metrics
|
|
72
|
+
- Per-class performance
|
|
73
|
+
- Confusion matrix visualization
|
|
74
|
+
- ROC/PR curves
|
|
75
|
+
- Feature importance
|
|
76
|
+
- Error analysis
|
|
77
|
+
|
|
78
|
+
## Progress
|
|
79
|
+
- [ ] Model loaded
|
|
80
|
+
- [ ] Test data prepared
|
|
81
|
+
- [ ] Metrics computed
|
|
82
|
+
- [ ] Analysis complete
|
|
83
|
+
- [ ] Report generated
|
|
84
|
+
|
|
85
|
+
Comprehensive evaluation for production readiness.
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Train ML model with full pipeline including experiment tracking, checkpointing, and early stopping
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <model_type> [--config <config>] [--experiment <name>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Model Training: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Train model: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **model-engineer-agent** for comprehensive model training.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **model_type**: Model architecture or type
|
|
16
|
+
- **config**: Path to training config
|
|
17
|
+
- **experiment**: Experiment name for tracking
|
|
18
|
+
|
|
19
|
+
## Features
|
|
20
|
+
- Automatic experiment tracking (MLflow/W&B)
|
|
21
|
+
- Checkpointing
|
|
22
|
+
- Early stopping
|
|
23
|
+
- Learning rate scheduling
|
|
24
|
+
- Gradient accumulation
|
|
25
|
+
- Mixed precision training
|
|
26
|
+
- Distributed training support
|
|
27
|
+
|
|
28
|
+
## Code Template
|
|
29
|
+
```python
|
|
30
|
+
from omgkit.training import Trainer
|
|
31
|
+
import mlflow
|
|
32
|
+
|
|
33
|
+
trainer = Trainer(
|
|
34
|
+
model_type="pytorch",
|
|
35
|
+
config_path="config/model_config.yaml",
|
|
36
|
+
experiment_name="churn_prediction_v2"
|
|
37
|
+
)
|
|
38
|
+
|
|
39
|
+
model = trainer.create_model(
|
|
40
|
+
architecture="mlp",
|
|
41
|
+
input_dim=100,
|
|
42
|
+
hidden_dims=[256, 128, 64],
|
|
43
|
+
output_dim=2,
|
|
44
|
+
dropout=0.3
|
|
45
|
+
)
|
|
46
|
+
|
|
47
|
+
with mlflow.start_run():
|
|
48
|
+
history = trainer.train(
|
|
49
|
+
model=model,
|
|
50
|
+
train_data="data/splits/train.parquet",
|
|
51
|
+
val_data="data/splits/val.parquet",
|
|
52
|
+
epochs=100,
|
|
53
|
+
batch_size=64,
|
|
54
|
+
learning_rate=1e-3,
|
|
55
|
+
early_stopping=True,
|
|
56
|
+
patience=10,
|
|
57
|
+
callbacks=[
|
|
58
|
+
trainer.callbacks.checkpoint("models/checkpoints/"),
|
|
59
|
+
trainer.callbacks.lr_scheduler("cosine"),
|
|
60
|
+
trainer.callbacks.tensorboard("logs/")
|
|
61
|
+
]
|
|
62
|
+
)
|
|
63
|
+
|
|
64
|
+
mlflow.log_metrics(history.final_metrics)
|
|
65
|
+
mlflow.pytorch.log_model(model, "model")
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Training Options
|
|
69
|
+
- Optimizers: Adam, SGD, AdamW
|
|
70
|
+
- Schedulers: cosine, step, plateau
|
|
71
|
+
- Loss functions: cross-entropy, focal, custom
|
|
72
|
+
- Regularization: dropout, weight decay
|
|
73
|
+
|
|
74
|
+
## Progress
|
|
75
|
+
- [ ] Config loaded
|
|
76
|
+
- [ ] Model created
|
|
77
|
+
- [ ] Training started
|
|
78
|
+
- [ ] Validation monitored
|
|
79
|
+
- [ ] Model saved
|
|
80
|
+
|
|
81
|
+
Train production-ready models with full reproducibility.
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Hyperparameter tuning with Optuna using TPE, random, grid, or CMA-ES optimization
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: "[--n_trials <n>] [--method <method>]"
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Hyperparameter Tuning: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Tune hyperparameters: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **model-engineer-agent** for hyperparameter optimization.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **n_trials**: Number of optimization trials (default: 100)
|
|
16
|
+
- **method**: tpe | random | grid | cmaes (default: tpe)
|
|
17
|
+
|
|
18
|
+
## Search Methods
|
|
19
|
+
|
|
20
|
+
### TPE (Tree-structured Parzen Estimator)
|
|
21
|
+
- Bayesian optimization
|
|
22
|
+
- Sample-efficient
|
|
23
|
+
- Handles conditionals
|
|
24
|
+
|
|
25
|
+
### Random Search
|
|
26
|
+
- Simple baseline
|
|
27
|
+
- Parallelizable
|
|
28
|
+
- Good for high dimensions
|
|
29
|
+
|
|
30
|
+
### Grid Search
|
|
31
|
+
- Exhaustive search
|
|
32
|
+
- Reproducible
|
|
33
|
+
- Best for small spaces
|
|
34
|
+
|
|
35
|
+
### CMA-ES
|
|
36
|
+
- Evolution strategy
|
|
37
|
+
- Good for continuous spaces
|
|
38
|
+
- Noise-robust
|
|
39
|
+
|
|
40
|
+
## Code Template
|
|
41
|
+
```python
|
|
42
|
+
from omgkit.training import HyperparameterTuner
|
|
43
|
+
import optuna
|
|
44
|
+
|
|
45
|
+
tuner = HyperparameterTuner(
|
|
46
|
+
study_name="churn_prediction_tuning",
|
|
47
|
+
direction="maximize",
|
|
48
|
+
sampler=optuna.samplers.TPESampler()
|
|
49
|
+
)
|
|
50
|
+
|
|
51
|
+
search_space = {
|
|
52
|
+
"learning_rate": {"type": "float", "low": 1e-5, "high": 1e-1, "log": True},
|
|
53
|
+
"hidden_dim": {"type": "categorical", "choices": [64, 128, 256, 512]},
|
|
54
|
+
"num_layers": {"type": "int", "low": 1, "high": 5},
|
|
55
|
+
"dropout": {"type": "float", "low": 0.0, "high": 0.5},
|
|
56
|
+
"batch_size": {"type": "categorical", "choices": [16, 32, 64, 128]}
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
best_params = tuner.tune(
|
|
60
|
+
train_fn=train_model,
|
|
61
|
+
search_space=search_space,
|
|
62
|
+
n_trials=100,
|
|
63
|
+
metric="val_f1",
|
|
64
|
+
pruner=optuna.pruners.MedianPruner()
|
|
65
|
+
)
|
|
66
|
+
|
|
67
|
+
tuner.report(output="reports/tuning_report.html")
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Pruning
|
|
71
|
+
- Median pruner: Stop unpromising trials
|
|
72
|
+
- Percentile pruner: More aggressive
|
|
73
|
+
- Hyperband: Multi-fidelity
|
|
74
|
+
|
|
75
|
+
## Output
|
|
76
|
+
- Best hyperparameters
|
|
77
|
+
- Optimization history
|
|
78
|
+
- Parameter importance
|
|
79
|
+
- Parallel coordinate plot
|
|
80
|
+
- HTML report
|
|
81
|
+
|
|
82
|
+
## Progress
|
|
83
|
+
- [ ] Search space defined
|
|
84
|
+
- [ ] Optimization started
|
|
85
|
+
- [ ] Trials completed
|
|
86
|
+
- [ ] Best params found
|
|
87
|
+
- [ ] Report generated
|
|
88
|
+
|
|
89
|
+
Find optimal hyperparameters efficiently.
|