omgkit 2.20.0 → 2.21.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +125 -10
- package/package.json +1 -1
- package/plugin/agents/ai-architect-agent.md +282 -0
- package/plugin/agents/data-scientist-agent.md +221 -0
- package/plugin/agents/experiment-analyst-agent.md +318 -0
- package/plugin/agents/ml-engineer-agent.md +165 -0
- package/plugin/agents/mlops-engineer-agent.md +324 -0
- package/plugin/agents/model-optimizer-agent.md +287 -0
- package/plugin/agents/production-engineer-agent.md +360 -0
- package/plugin/agents/research-scientist-agent.md +274 -0
- package/plugin/commands/omgdata/augment.md +86 -0
- package/plugin/commands/omgdata/collect.md +81 -0
- package/plugin/commands/omgdata/label.md +83 -0
- package/plugin/commands/omgdata/split.md +83 -0
- package/plugin/commands/omgdata/validate.md +76 -0
- package/plugin/commands/omgdata/version.md +85 -0
- package/plugin/commands/omgdeploy/ab.md +94 -0
- package/plugin/commands/omgdeploy/cloud.md +89 -0
- package/plugin/commands/omgdeploy/edge.md +93 -0
- package/plugin/commands/omgdeploy/package.md +91 -0
- package/plugin/commands/omgdeploy/serve.md +92 -0
- package/plugin/commands/omgfeature/embed.md +93 -0
- package/plugin/commands/omgfeature/extract.md +93 -0
- package/plugin/commands/omgfeature/select.md +85 -0
- package/plugin/commands/omgfeature/store.md +97 -0
- package/plugin/commands/omgml/init.md +60 -0
- package/plugin/commands/omgml/status.md +82 -0
- package/plugin/commands/omgops/drift.md +87 -0
- package/plugin/commands/omgops/monitor.md +99 -0
- package/plugin/commands/omgops/pipeline.md +102 -0
- package/plugin/commands/omgops/registry.md +109 -0
- package/plugin/commands/omgops/retrain.md +91 -0
- package/plugin/commands/omgoptim/distill.md +90 -0
- package/plugin/commands/omgoptim/profile.md +92 -0
- package/plugin/commands/omgoptim/prune.md +81 -0
- package/plugin/commands/omgoptim/quantize.md +83 -0
- package/plugin/commands/omgtrain/baseline.md +78 -0
- package/plugin/commands/omgtrain/compare.md +99 -0
- package/plugin/commands/omgtrain/evaluate.md +85 -0
- package/plugin/commands/omgtrain/train.md +81 -0
- package/plugin/commands/omgtrain/tune.md +89 -0
- package/plugin/registry.yaml +252 -2
- package/plugin/skills/ml-systems/SKILL.md +65 -0
- package/plugin/skills/ml-systems/ai-accelerators/SKILL.md +342 -0
- package/plugin/skills/ml-systems/data-eng/SKILL.md +126 -0
- package/plugin/skills/ml-systems/deep-learning-primer/SKILL.md +143 -0
- package/plugin/skills/ml-systems/deployment-paradigms/SKILL.md +148 -0
- package/plugin/skills/ml-systems/dnn-architectures/SKILL.md +128 -0
- package/plugin/skills/ml-systems/edge-deployment/SKILL.md +366 -0
- package/plugin/skills/ml-systems/efficient-ai/SKILL.md +316 -0
- package/plugin/skills/ml-systems/feature-engineering/SKILL.md +151 -0
- package/plugin/skills/ml-systems/ml-frameworks/SKILL.md +187 -0
- package/plugin/skills/ml-systems/ml-serving-optimization/SKILL.md +371 -0
- package/plugin/skills/ml-systems/ml-systems-fundamentals/SKILL.md +103 -0
- package/plugin/skills/ml-systems/ml-workflow/SKILL.md +162 -0
- package/plugin/skills/ml-systems/mlops/SKILL.md +386 -0
- package/plugin/skills/ml-systems/model-deployment/SKILL.md +350 -0
- package/plugin/skills/ml-systems/model-dev/SKILL.md +160 -0
- package/plugin/skills/ml-systems/model-optimization/SKILL.md +339 -0
- package/plugin/skills/ml-systems/robust-ai/SKILL.md +395 -0
- package/plugin/skills/ml-systems/training-data/SKILL.md +152 -0
- package/plugin/workflows/ml-systems/data-preparation-workflow.md +276 -0
- package/plugin/workflows/ml-systems/edge-deployment-workflow.md +413 -0
- package/plugin/workflows/ml-systems/full-ml-lifecycle-workflow.md +405 -0
- package/plugin/workflows/ml-systems/hyperparameter-tuning-workflow.md +352 -0
- package/plugin/workflows/ml-systems/mlops-pipeline-workflow.md +384 -0
- package/plugin/workflows/ml-systems/model-deployment-workflow.md +392 -0
- package/plugin/workflows/ml-systems/model-development-workflow.md +218 -0
- package/plugin/workflows/ml-systems/model-evaluation-workflow.md +416 -0
- package/plugin/workflows/ml-systems/model-optimization-workflow.md +390 -0
- package/plugin/workflows/ml-systems/monitoring-drift-workflow.md +446 -0
- package/plugin/workflows/ml-systems/retraining-workflow.md +401 -0
- package/plugin/workflows/ml-systems/training-pipeline-workflow.md +382 -0
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Display comprehensive ML project status including data, model, deployment, and pipeline health
|
|
3
|
+
allowed-tools: Task, Read, Bash, Grep, Glob
|
|
4
|
+
argument-hint: "[--detailed]"
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# ML Project Status: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Show ML project status: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **orchestrator-agent** for comprehensive status collection.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **--detailed**: Show extended information for each component
|
|
16
|
+
|
|
17
|
+
## Status Components
|
|
18
|
+
|
|
19
|
+
### Data Status
|
|
20
|
+
- Training set size and last update
|
|
21
|
+
- Validation/Test set sizes
|
|
22
|
+
- Data version (DVC)
|
|
23
|
+
- Data quality metrics
|
|
24
|
+
|
|
25
|
+
### Model Status
|
|
26
|
+
- Current model version
|
|
27
|
+
- Model architecture
|
|
28
|
+
- Performance metrics (accuracy, F1, etc.)
|
|
29
|
+
- Last training timestamp
|
|
30
|
+
|
|
31
|
+
### Deployment Status
|
|
32
|
+
- Production model version
|
|
33
|
+
- Staging model version
|
|
34
|
+
- Endpoint health
|
|
35
|
+
- Request latency
|
|
36
|
+
|
|
37
|
+
### Pipeline Status
|
|
38
|
+
- Last pipeline run
|
|
39
|
+
- Success/failure rate
|
|
40
|
+
- Next scheduled run
|
|
41
|
+
- Active experiments
|
|
42
|
+
|
|
43
|
+
## Example Output
|
|
44
|
+
```
|
|
45
|
+
ML Project Status
|
|
46
|
+
═══════════════════════════════════════════════════
|
|
47
|
+
|
|
48
|
+
Project: customer-churn-prediction
|
|
49
|
+
Type: Classification (Binary)
|
|
50
|
+
Created: 2024-01-15
|
|
51
|
+
|
|
52
|
+
DATA STATUS
|
|
53
|
+
├── Training: 50,000 samples (updated: 2h ago)
|
|
54
|
+
├── Validation: 10,000 samples
|
|
55
|
+
├── Test: 10,000 samples
|
|
56
|
+
└── Data Version: v1.2.3
|
|
57
|
+
|
|
58
|
+
MODEL STATUS
|
|
59
|
+
├── Current: XGBoost v2.1.0
|
|
60
|
+
├── Accuracy: 94.2%
|
|
61
|
+
├── F1-Score: 0.89
|
|
62
|
+
└── Last trained: 1d ago
|
|
63
|
+
|
|
64
|
+
DEPLOYMENT STATUS
|
|
65
|
+
├── Production: v2.0.0 (deployed 3d ago)
|
|
66
|
+
├── Staging: v2.1.0 (testing)
|
|
67
|
+
└── Endpoint: https://api.example.com/predict
|
|
68
|
+
|
|
69
|
+
PIPELINE STATUS
|
|
70
|
+
├── Last run: 2h ago ✅
|
|
71
|
+
├── Success rate: 98.5%
|
|
72
|
+
└── Next scheduled: in 22h
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## Progress
|
|
76
|
+
- [ ] Data status collected
|
|
77
|
+
- [ ] Model status collected
|
|
78
|
+
- [ ] Deployment status collected
|
|
79
|
+
- [ ] Pipeline status collected
|
|
80
|
+
- [ ] Report generated
|
|
81
|
+
|
|
82
|
+
Provide actionable insights for project health.
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Detect and analyze data drift, label drift, and concept drift with statistical methods
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <drift_type> [--reference <ref>] [--current <current>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Drift Detection: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Detect drift: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **monitoring-agent** for drift detection.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **drift_type**: feature | label | concept | all
|
|
16
|
+
- **reference**: Path to reference data
|
|
17
|
+
- **current**: Path to current data
|
|
18
|
+
|
|
19
|
+
## Drift Types
|
|
20
|
+
|
|
21
|
+
### Feature Drift
|
|
22
|
+
- Input distribution change
|
|
23
|
+
- P(X) has changed
|
|
24
|
+
- Methods: KS test, PSI, Wasserstein
|
|
25
|
+
|
|
26
|
+
### Label Drift
|
|
27
|
+
- Target distribution change
|
|
28
|
+
- P(Y) has changed
|
|
29
|
+
- Methods: Chi-square, JS divergence
|
|
30
|
+
|
|
31
|
+
### Concept Drift
|
|
32
|
+
- P(Y|X) relationship change
|
|
33
|
+
- Model degradation
|
|
34
|
+
- Methods: Performance monitoring, DDM, ADWIN
|
|
35
|
+
|
|
36
|
+
## Code Template
|
|
37
|
+
```python
|
|
38
|
+
from omgkit.mlops import DriftDetector
|
|
39
|
+
|
|
40
|
+
detector = DriftDetector()
|
|
41
|
+
|
|
42
|
+
drift_report = detector.detect(
|
|
43
|
+
reference_data="data/reference/baseline.parquet",
|
|
44
|
+
current_data="data/production/current_week.parquet",
|
|
45
|
+
drift_types=["feature", "label", "concept"],
|
|
46
|
+
methods={
|
|
47
|
+
"feature": "ks_test",
|
|
48
|
+
"label": "chi_square",
|
|
49
|
+
"concept": "model_performance"
|
|
50
|
+
},
|
|
51
|
+
model_path="models/production/model.pt",
|
|
52
|
+
threshold=0.05
|
|
53
|
+
)
|
|
54
|
+
|
|
55
|
+
print(drift_report.summary())
|
|
56
|
+
|
|
57
|
+
if drift_report.drift_detected:
|
|
58
|
+
detector.trigger_retraining()
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Statistical Methods
|
|
62
|
+
- **KS Test**: Kolmogorov-Smirnov
|
|
63
|
+
- **PSI**: Population Stability Index
|
|
64
|
+
- **Wasserstein**: Earth Mover's Distance
|
|
65
|
+
- **Chi-Square**: Categorical data
|
|
66
|
+
- **JS Divergence**: Jensen-Shannon
|
|
67
|
+
|
|
68
|
+
## Actions on Drift
|
|
69
|
+
- Alert notification
|
|
70
|
+
- Trigger retraining
|
|
71
|
+
- Rollback model
|
|
72
|
+
- Increase monitoring
|
|
73
|
+
|
|
74
|
+
## Report Output
|
|
75
|
+
- Drift scores per feature
|
|
76
|
+
- Statistical significance
|
|
77
|
+
- Trend visualization
|
|
78
|
+
- Recommended actions
|
|
79
|
+
|
|
80
|
+
## Progress
|
|
81
|
+
- [ ] Data loaded
|
|
82
|
+
- [ ] Drift computed
|
|
83
|
+
- [ ] Analysis complete
|
|
84
|
+
- [ ] Report generated
|
|
85
|
+
- [ ] Actions triggered
|
|
86
|
+
|
|
87
|
+
Proactively detect model degradation causes.
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Setup ML monitoring and alerting for data quality, drift detection, and model performance
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: "[--metrics <metrics>] [--alerts <alerts>]"
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# ML Monitoring: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Setup monitoring: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **monitoring-agent** for comprehensive ML monitoring.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **metrics**: List of metrics to monitor
|
|
16
|
+
- **alerts**: Alert configurations
|
|
17
|
+
|
|
18
|
+
## Monitoring Types
|
|
19
|
+
|
|
20
|
+
### Data Quality
|
|
21
|
+
- Null rate monitoring
|
|
22
|
+
- Schema violations
|
|
23
|
+
- Value distributions
|
|
24
|
+
- Anomaly detection
|
|
25
|
+
|
|
26
|
+
### Data Drift
|
|
27
|
+
- Feature drift
|
|
28
|
+
- Label drift
|
|
29
|
+
- Concept drift
|
|
30
|
+
- Statistical tests
|
|
31
|
+
|
|
32
|
+
### Model Performance
|
|
33
|
+
- Accuracy degradation
|
|
34
|
+
- Latency increase
|
|
35
|
+
- Error rate
|
|
36
|
+
- Throughput
|
|
37
|
+
|
|
38
|
+
### System Metrics
|
|
39
|
+
- Memory usage
|
|
40
|
+
- CPU utilization
|
|
41
|
+
- Request rate
|
|
42
|
+
- Response time
|
|
43
|
+
|
|
44
|
+
## Code Template
|
|
45
|
+
```python
|
|
46
|
+
from omgkit.mlops import MonitoringSetup
|
|
47
|
+
|
|
48
|
+
monitor = MonitoringSetup()
|
|
49
|
+
|
|
50
|
+
monitor.setup(
|
|
51
|
+
model_endpoint="https://api.example.com/predict",
|
|
52
|
+
metrics=[
|
|
53
|
+
{"type": "null_rate", "column": "user_id", "threshold": 0.01},
|
|
54
|
+
{"type": "feature_drift", "method": "ks_test", "threshold": 0.05},
|
|
55
|
+
{"type": "concept_drift", "window": "7d"},
|
|
56
|
+
{"type": "accuracy", "reference": 0.94, "threshold": 0.02},
|
|
57
|
+
{"type": "latency_p99", "threshold_ms": 100},
|
|
58
|
+
{"type": "memory_usage", "threshold_pct": 80},
|
|
59
|
+
{"type": "error_rate", "threshold": 0.01}
|
|
60
|
+
],
|
|
61
|
+
alerts=[
|
|
62
|
+
{
|
|
63
|
+
"name": "accuracy_drop",
|
|
64
|
+
"condition": "accuracy < 0.90",
|
|
65
|
+
"channel": "slack",
|
|
66
|
+
"severity": "critical"
|
|
67
|
+
},
|
|
68
|
+
{
|
|
69
|
+
"name": "high_latency",
|
|
70
|
+
"condition": "latency_p99 > 200",
|
|
71
|
+
"channel": "pagerduty",
|
|
72
|
+
"severity": "warning"
|
|
73
|
+
}
|
|
74
|
+
],
|
|
75
|
+
dashboard="grafana"
|
|
76
|
+
)
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
## Alert Channels
|
|
80
|
+
- Slack
|
|
81
|
+
- PagerDuty
|
|
82
|
+
- Email
|
|
83
|
+
- Webhooks
|
|
84
|
+
- SMS
|
|
85
|
+
|
|
86
|
+
## Dashboard Features
|
|
87
|
+
- Real-time metrics
|
|
88
|
+
- Historical trends
|
|
89
|
+
- Anomaly highlighting
|
|
90
|
+
- Drill-down analysis
|
|
91
|
+
|
|
92
|
+
## Progress
|
|
93
|
+
- [ ] Metrics configured
|
|
94
|
+
- [ ] Alerts defined
|
|
95
|
+
- [ ] Dashboard created
|
|
96
|
+
- [ ] Integration tested
|
|
97
|
+
- [ ] Monitoring active
|
|
98
|
+
|
|
99
|
+
Ensure production ML system health and reliability.
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Create CI/CD pipeline for ML using GitHub Actions, GitLab CI, Jenkins, Kubeflow, or Airflow
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <platform> [--template <template>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# ML Pipeline: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Create pipeline: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **mlops-engineer-agent** for pipeline creation.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **platform**: github_actions | gitlab_ci | jenkins | kubeflow | airflow
|
|
16
|
+
- **template**: training | deployment | full (default: full)
|
|
17
|
+
|
|
18
|
+
## Pipeline Types
|
|
19
|
+
|
|
20
|
+
### Training Pipeline
|
|
21
|
+
- Data validation
|
|
22
|
+
- Feature engineering
|
|
23
|
+
- Model training
|
|
24
|
+
- Model evaluation
|
|
25
|
+
- Model registration
|
|
26
|
+
|
|
27
|
+
### Deployment Pipeline
|
|
28
|
+
- Model validation
|
|
29
|
+
- Canary deployment
|
|
30
|
+
- Integration tests
|
|
31
|
+
- Production promotion
|
|
32
|
+
|
|
33
|
+
### Full Pipeline
|
|
34
|
+
- Complete CI/CD
|
|
35
|
+
- End-to-end automation
|
|
36
|
+
- Monitoring integration
|
|
37
|
+
|
|
38
|
+
## Code Template
|
|
39
|
+
```python
|
|
40
|
+
from omgkit.mlops import PipelineBuilder
|
|
41
|
+
|
|
42
|
+
builder = PipelineBuilder(platform="github_actions")
|
|
43
|
+
|
|
44
|
+
pipeline = builder.create_pipeline(
|
|
45
|
+
name="ml-training-pipeline",
|
|
46
|
+
stages=[
|
|
47
|
+
{
|
|
48
|
+
"name": "data_validation",
|
|
49
|
+
"script": "scripts/validate_data.py",
|
|
50
|
+
"triggers": ["data_push"]
|
|
51
|
+
},
|
|
52
|
+
{
|
|
53
|
+
"name": "training",
|
|
54
|
+
"script": "scripts/train.py",
|
|
55
|
+
"resources": {"gpu": True, "memory": "16Gi"},
|
|
56
|
+
"triggers": ["data_validation_success"]
|
|
57
|
+
},
|
|
58
|
+
{
|
|
59
|
+
"name": "evaluation",
|
|
60
|
+
"script": "scripts/evaluate.py",
|
|
61
|
+
"triggers": ["training_success"]
|
|
62
|
+
},
|
|
63
|
+
{
|
|
64
|
+
"name": "deployment",
|
|
65
|
+
"script": "scripts/deploy.py",
|
|
66
|
+
"triggers": ["evaluation_pass"],
|
|
67
|
+
"conditions": ["accuracy > 0.9"]
|
|
68
|
+
}
|
|
69
|
+
]
|
|
70
|
+
)
|
|
71
|
+
|
|
72
|
+
pipeline.save(".github/workflows/ml-pipeline.yml")
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## Platform Features
|
|
76
|
+
|
|
77
|
+
### GitHub Actions
|
|
78
|
+
- Matrix builds
|
|
79
|
+
- Self-hosted runners
|
|
80
|
+
- Secrets management
|
|
81
|
+
- Artifact caching
|
|
82
|
+
|
|
83
|
+
### Kubeflow
|
|
84
|
+
- Kubernetes native
|
|
85
|
+
- GPU scheduling
|
|
86
|
+
- Experiment tracking
|
|
87
|
+
- Pipeline versioning
|
|
88
|
+
|
|
89
|
+
### Airflow
|
|
90
|
+
- DAG scheduling
|
|
91
|
+
- XCom data passing
|
|
92
|
+
- Sensor triggers
|
|
93
|
+
- Backfill support
|
|
94
|
+
|
|
95
|
+
## Progress
|
|
96
|
+
- [ ] Platform configured
|
|
97
|
+
- [ ] Stages defined
|
|
98
|
+
- [ ] Triggers set
|
|
99
|
+
- [ ] Pipeline generated
|
|
100
|
+
- [ ] Tests added
|
|
101
|
+
|
|
102
|
+
Automate ML workflows with CI/CD best practices.
|
|
@@ -0,0 +1,109 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Model registry operations - register, list, promote, archive, and rollback model versions
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <action> [--model <model>] [--version <version>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Model Registry: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Registry operation: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **mlops-engineer-agent** for model registry management.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **action**: register | list | promote | archive | rollback
|
|
16
|
+
- **model**: Model name
|
|
17
|
+
- **version**: Model version
|
|
18
|
+
|
|
19
|
+
## Registry Actions
|
|
20
|
+
|
|
21
|
+
### Register
|
|
22
|
+
- Add new model version
|
|
23
|
+
- Store metadata
|
|
24
|
+
- Link artifacts
|
|
25
|
+
- Tag for tracking
|
|
26
|
+
|
|
27
|
+
### List
|
|
28
|
+
- Show all versions
|
|
29
|
+
- Filter by stage
|
|
30
|
+
- Display metrics
|
|
31
|
+
- Show lineage
|
|
32
|
+
|
|
33
|
+
### Promote
|
|
34
|
+
- Move to next stage
|
|
35
|
+
- Staging → Production
|
|
36
|
+
- Update endpoints
|
|
37
|
+
- Notify stakeholders
|
|
38
|
+
|
|
39
|
+
### Archive
|
|
40
|
+
- Deprecate old versions
|
|
41
|
+
- Preserve artifacts
|
|
42
|
+
- Update documentation
|
|
43
|
+
|
|
44
|
+
### Rollback
|
|
45
|
+
- Revert to previous version
|
|
46
|
+
- Restore endpoint
|
|
47
|
+
- Emergency recovery
|
|
48
|
+
|
|
49
|
+
## Code Template
|
|
50
|
+
```python
|
|
51
|
+
from omgkit.mlops import ModelRegistry
|
|
52
|
+
import mlflow
|
|
53
|
+
|
|
54
|
+
registry = ModelRegistry(tracking_uri="http://mlflow.example.com")
|
|
55
|
+
|
|
56
|
+
# Register new model
|
|
57
|
+
registry.register(
|
|
58
|
+
model_path="models/best_model.pt",
|
|
59
|
+
model_name="churn_predictor",
|
|
60
|
+
tags={
|
|
61
|
+
"framework": "pytorch",
|
|
62
|
+
"task": "classification",
|
|
63
|
+
"team": "data-science"
|
|
64
|
+
},
|
|
65
|
+
metrics={
|
|
66
|
+
"accuracy": 0.94,
|
|
67
|
+
"f1": 0.89,
|
|
68
|
+
"latency_ms": 15
|
|
69
|
+
}
|
|
70
|
+
)
|
|
71
|
+
|
|
72
|
+
# Promote to production
|
|
73
|
+
registry.promote(
|
|
74
|
+
model_name="churn_predictor",
|
|
75
|
+
version="2.1.0",
|
|
76
|
+
stage="Production"
|
|
77
|
+
)
|
|
78
|
+
|
|
79
|
+
# Rollback if needed
|
|
80
|
+
registry.rollback(
|
|
81
|
+
model_name="churn_predictor",
|
|
82
|
+
to_version="2.0.0"
|
|
83
|
+
)
|
|
84
|
+
|
|
85
|
+
# List all versions
|
|
86
|
+
versions = registry.list_versions("churn_predictor")
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
## Model Stages
|
|
90
|
+
- **None**: Development
|
|
91
|
+
- **Staging**: Testing
|
|
92
|
+
- **Production**: Live
|
|
93
|
+
- **Archived**: Deprecated
|
|
94
|
+
|
|
95
|
+
## Metadata Stored
|
|
96
|
+
- Training parameters
|
|
97
|
+
- Performance metrics
|
|
98
|
+
- Data version
|
|
99
|
+
- Code commit
|
|
100
|
+
- Environment specs
|
|
101
|
+
|
|
102
|
+
## Progress
|
|
103
|
+
- [ ] Action validated
|
|
104
|
+
- [ ] Registry connected
|
|
105
|
+
- [ ] Operation executed
|
|
106
|
+
- [ ] Endpoints updated
|
|
107
|
+
- [ ] Notifications sent
|
|
108
|
+
|
|
109
|
+
Centralize model lifecycle management.
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Trigger model retraining with full, incremental, or transfer learning strategies
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <strategy> [--trigger <trigger>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Model Retraining: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Trigger retraining: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **mlops-engineer-agent** for retraining orchestration.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **strategy**: full | incremental | transfer (default: incremental)
|
|
16
|
+
- **trigger**: manual | scheduled | drift_detected | performance_drop
|
|
17
|
+
|
|
18
|
+
## Retraining Strategies
|
|
19
|
+
|
|
20
|
+
### Full Retraining
|
|
21
|
+
- Train from scratch
|
|
22
|
+
- All historical data
|
|
23
|
+
- New architecture possible
|
|
24
|
+
- Use case: Major updates
|
|
25
|
+
|
|
26
|
+
### Incremental Training
|
|
27
|
+
- Update with new data
|
|
28
|
+
- Stateful training
|
|
29
|
+
- Faster convergence
|
|
30
|
+
- Use case: Regular updates
|
|
31
|
+
|
|
32
|
+
### Transfer Learning
|
|
33
|
+
- Fine-tune from checkpoint
|
|
34
|
+
- Domain adaptation
|
|
35
|
+
- New task learning
|
|
36
|
+
- Use case: Limited new data
|
|
37
|
+
|
|
38
|
+
## Code Template
|
|
39
|
+
```python
|
|
40
|
+
from omgkit.mlops import RetrainingPipeline
|
|
41
|
+
|
|
42
|
+
pipeline = RetrainingPipeline()
|
|
43
|
+
|
|
44
|
+
pipeline.setup(
|
|
45
|
+
strategy="incremental",
|
|
46
|
+
triggers=[
|
|
47
|
+
{"type": "scheduled", "cron": "0 0 * * 0"}, # Weekly
|
|
48
|
+
{"type": "drift_detected", "threshold": 0.1},
|
|
49
|
+
{"type": "performance_drop", "metric": "accuracy", "threshold": 0.05}
|
|
50
|
+
],
|
|
51
|
+
data_config={
|
|
52
|
+
"new_data_path": "data/production/new/",
|
|
53
|
+
"validation_data": "data/splits/val.parquet",
|
|
54
|
+
"min_new_samples": 1000
|
|
55
|
+
},
|
|
56
|
+
training_config={
|
|
57
|
+
"epochs": 10,
|
|
58
|
+
"learning_rate": 1e-4,
|
|
59
|
+
"early_stopping": True
|
|
60
|
+
},
|
|
61
|
+
validation_config={
|
|
62
|
+
"metrics": ["accuracy", "f1"],
|
|
63
|
+
"minimum_improvement": 0.01,
|
|
64
|
+
"fallback_on_regression": True
|
|
65
|
+
}
|
|
66
|
+
)
|
|
67
|
+
|
|
68
|
+
# Manual trigger
|
|
69
|
+
pipeline.trigger(reason="manual_update")
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Safety Checks
|
|
73
|
+
- Minimum improvement threshold
|
|
74
|
+
- Regression detection
|
|
75
|
+
- Automatic rollback
|
|
76
|
+
- Shadow deployment
|
|
77
|
+
|
|
78
|
+
## Triggers
|
|
79
|
+
- **Scheduled**: Cron-based
|
|
80
|
+
- **Drift**: Automatic on drift
|
|
81
|
+
- **Performance**: Metrics degradation
|
|
82
|
+
- **Data**: New data threshold
|
|
83
|
+
|
|
84
|
+
## Progress
|
|
85
|
+
- [ ] Trigger validated
|
|
86
|
+
- [ ] Data prepared
|
|
87
|
+
- [ ] Training started
|
|
88
|
+
- [ ] Validation passed
|
|
89
|
+
- [ ] Model deployed
|
|
90
|
+
|
|
91
|
+
Maintain model freshness with automated retraining.
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Knowledge distillation from teacher model to smaller student model for efficient deployment
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: "[--teacher <path>] [--student_config <config>]"
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Knowledge Distillation: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Distill knowledge: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **performance-engineer-agent** for knowledge distillation.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **teacher**: Path to teacher (large) model
|
|
16
|
+
- **student_config**: Path to student model configuration
|
|
17
|
+
|
|
18
|
+
## Distillation Types
|
|
19
|
+
|
|
20
|
+
### Response-Based
|
|
21
|
+
- Match soft labels
|
|
22
|
+
- Temperature scaling
|
|
23
|
+
- KL divergence loss
|
|
24
|
+
|
|
25
|
+
### Feature-Based
|
|
26
|
+
- Match intermediate layers
|
|
27
|
+
- Attention transfer
|
|
28
|
+
- Hidden state matching
|
|
29
|
+
|
|
30
|
+
### Relation-Based
|
|
31
|
+
- Match pairwise relations
|
|
32
|
+
- Graph distillation
|
|
33
|
+
- Contrastive learning
|
|
34
|
+
|
|
35
|
+
## Code Template
|
|
36
|
+
```python
|
|
37
|
+
from omgkit.optimization import KnowledgeDistiller
|
|
38
|
+
|
|
39
|
+
distiller = KnowledgeDistiller()
|
|
40
|
+
|
|
41
|
+
# Define student model (smaller)
|
|
42
|
+
student = distiller.create_student(
|
|
43
|
+
architecture="mlp",
|
|
44
|
+
hidden_dims=[64, 32], # Smaller than teacher
|
|
45
|
+
output_dim=10
|
|
46
|
+
)
|
|
47
|
+
|
|
48
|
+
# Distill
|
|
49
|
+
distilled_model = distiller.distill(
|
|
50
|
+
teacher_path="models/teacher_large.pt",
|
|
51
|
+
student=student,
|
|
52
|
+
train_data="data/splits/train.parquet",
|
|
53
|
+
temperature=4.0,
|
|
54
|
+
alpha=0.5, # Weight for distillation loss
|
|
55
|
+
epochs=50
|
|
56
|
+
)
|
|
57
|
+
|
|
58
|
+
# Compare
|
|
59
|
+
distiller.compare(
|
|
60
|
+
teacher="models/teacher_large.pt",
|
|
61
|
+
student=distilled_model,
|
|
62
|
+
test_data="data/splits/test.parquet"
|
|
63
|
+
)
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## Hyperparameters
|
|
67
|
+
- **Temperature**: Higher = softer labels (typically 2-10)
|
|
68
|
+
- **Alpha**: Balance between hard and soft labels
|
|
69
|
+
- **Loss**: KL divergence + cross-entropy
|
|
70
|
+
|
|
71
|
+
## Benefits
|
|
72
|
+
- Smaller model size
|
|
73
|
+
- Faster inference
|
|
74
|
+
- Lower memory
|
|
75
|
+
- Maintained accuracy
|
|
76
|
+
|
|
77
|
+
## Comparison Output
|
|
78
|
+
- Size comparison
|
|
79
|
+
- Speed comparison
|
|
80
|
+
- Accuracy comparison
|
|
81
|
+
- Layer-by-layer analysis
|
|
82
|
+
|
|
83
|
+
## Progress
|
|
84
|
+
- [ ] Teacher loaded
|
|
85
|
+
- [ ] Student created
|
|
86
|
+
- [ ] Distillation training
|
|
87
|
+
- [ ] Quality validated
|
|
88
|
+
- [ ] Report generated
|
|
89
|
+
|
|
90
|
+
Transfer knowledge from large to small model efficiently.
|