omgkit 2.20.0 → 2.21.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +125 -10
- package/package.json +1 -1
- package/plugin/agents/ai-architect-agent.md +282 -0
- package/plugin/agents/data-scientist-agent.md +221 -0
- package/plugin/agents/experiment-analyst-agent.md +318 -0
- package/plugin/agents/ml-engineer-agent.md +165 -0
- package/plugin/agents/mlops-engineer-agent.md +324 -0
- package/plugin/agents/model-optimizer-agent.md +287 -0
- package/plugin/agents/production-engineer-agent.md +360 -0
- package/plugin/agents/research-scientist-agent.md +274 -0
- package/plugin/commands/omgdata/augment.md +86 -0
- package/plugin/commands/omgdata/collect.md +81 -0
- package/plugin/commands/omgdata/label.md +83 -0
- package/plugin/commands/omgdata/split.md +83 -0
- package/plugin/commands/omgdata/validate.md +76 -0
- package/plugin/commands/omgdata/version.md +85 -0
- package/plugin/commands/omgdeploy/ab.md +94 -0
- package/plugin/commands/omgdeploy/cloud.md +89 -0
- package/plugin/commands/omgdeploy/edge.md +93 -0
- package/plugin/commands/omgdeploy/package.md +91 -0
- package/plugin/commands/omgdeploy/serve.md +92 -0
- package/plugin/commands/omgfeature/embed.md +93 -0
- package/plugin/commands/omgfeature/extract.md +93 -0
- package/plugin/commands/omgfeature/select.md +85 -0
- package/plugin/commands/omgfeature/store.md +97 -0
- package/plugin/commands/omgml/init.md +60 -0
- package/plugin/commands/omgml/status.md +82 -0
- package/plugin/commands/omgops/drift.md +87 -0
- package/plugin/commands/omgops/monitor.md +99 -0
- package/plugin/commands/omgops/pipeline.md +102 -0
- package/plugin/commands/omgops/registry.md +109 -0
- package/plugin/commands/omgops/retrain.md +91 -0
- package/plugin/commands/omgoptim/distill.md +90 -0
- package/plugin/commands/omgoptim/profile.md +92 -0
- package/plugin/commands/omgoptim/prune.md +81 -0
- package/plugin/commands/omgoptim/quantize.md +83 -0
- package/plugin/commands/omgtrain/baseline.md +78 -0
- package/plugin/commands/omgtrain/compare.md +99 -0
- package/plugin/commands/omgtrain/evaluate.md +85 -0
- package/plugin/commands/omgtrain/train.md +81 -0
- package/plugin/commands/omgtrain/tune.md +89 -0
- package/plugin/registry.yaml +252 -2
- package/plugin/skills/ml-systems/SKILL.md +65 -0
- package/plugin/skills/ml-systems/ai-accelerators/SKILL.md +342 -0
- package/plugin/skills/ml-systems/data-eng/SKILL.md +126 -0
- package/plugin/skills/ml-systems/deep-learning-primer/SKILL.md +143 -0
- package/plugin/skills/ml-systems/deployment-paradigms/SKILL.md +148 -0
- package/plugin/skills/ml-systems/dnn-architectures/SKILL.md +128 -0
- package/plugin/skills/ml-systems/edge-deployment/SKILL.md +366 -0
- package/plugin/skills/ml-systems/efficient-ai/SKILL.md +316 -0
- package/plugin/skills/ml-systems/feature-engineering/SKILL.md +151 -0
- package/plugin/skills/ml-systems/ml-frameworks/SKILL.md +187 -0
- package/plugin/skills/ml-systems/ml-serving-optimization/SKILL.md +371 -0
- package/plugin/skills/ml-systems/ml-systems-fundamentals/SKILL.md +103 -0
- package/plugin/skills/ml-systems/ml-workflow/SKILL.md +162 -0
- package/plugin/skills/ml-systems/mlops/SKILL.md +386 -0
- package/plugin/skills/ml-systems/model-deployment/SKILL.md +350 -0
- package/plugin/skills/ml-systems/model-dev/SKILL.md +160 -0
- package/plugin/skills/ml-systems/model-optimization/SKILL.md +339 -0
- package/plugin/skills/ml-systems/robust-ai/SKILL.md +395 -0
- package/plugin/skills/ml-systems/training-data/SKILL.md +152 -0
- package/plugin/workflows/ml-systems/data-preparation-workflow.md +276 -0
- package/plugin/workflows/ml-systems/edge-deployment-workflow.md +413 -0
- package/plugin/workflows/ml-systems/full-ml-lifecycle-workflow.md +405 -0
- package/plugin/workflows/ml-systems/hyperparameter-tuning-workflow.md +352 -0
- package/plugin/workflows/ml-systems/mlops-pipeline-workflow.md +384 -0
- package/plugin/workflows/ml-systems/model-deployment-workflow.md +392 -0
- package/plugin/workflows/ml-systems/model-development-workflow.md +218 -0
- package/plugin/workflows/ml-systems/model-evaluation-workflow.md +416 -0
- package/plugin/workflows/ml-systems/model-optimization-workflow.md +390 -0
- package/plugin/workflows/ml-systems/monitoring-drift-workflow.md +446 -0
- package/plugin/workflows/ml-systems/retraining-workflow.md +401 -0
- package/plugin/workflows/ml-systems/training-pipeline-workflow.md +382 -0
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Augment data to increase diversity and quantity for image, text, tabular, audio, and timeseries
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <data_type> [--techniques <techniques>] [--factor <factor>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Data Augmentation: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Augment data: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **data-engineer-agent** for data augmentation.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **data_type**: image | text | tabular | audio | timeseries
|
|
16
|
+
- **techniques**: Comma-separated list of techniques
|
|
17
|
+
- **factor**: Augmentation multiplier (default: 2.0)
|
|
18
|
+
|
|
19
|
+
## Techniques by Type
|
|
20
|
+
|
|
21
|
+
### Image
|
|
22
|
+
- random_flip, random_rotation
|
|
23
|
+
- random_crop, color_jitter
|
|
24
|
+
- gaussian_noise, cutout
|
|
25
|
+
- mixup, cutmix
|
|
26
|
+
|
|
27
|
+
### Text
|
|
28
|
+
- synonym_replacement
|
|
29
|
+
- random_insertion/swap/deletion
|
|
30
|
+
- back_translation
|
|
31
|
+
- contextual_augmentation (BERT)
|
|
32
|
+
|
|
33
|
+
### Tabular
|
|
34
|
+
- SMOTE for imbalanced data
|
|
35
|
+
- noise_injection
|
|
36
|
+
- mixup
|
|
37
|
+
- feature_permutation
|
|
38
|
+
|
|
39
|
+
### Audio
|
|
40
|
+
- time_stretch, pitch_shift
|
|
41
|
+
- add_noise, time_shift
|
|
42
|
+
- spectrogram augmentation
|
|
43
|
+
|
|
44
|
+
### Timeseries
|
|
45
|
+
- window_slicing
|
|
46
|
+
- magnitude_warping
|
|
47
|
+
- time_warping
|
|
48
|
+
- jittering
|
|
49
|
+
|
|
50
|
+
## Code Template
|
|
51
|
+
```python
|
|
52
|
+
from omgkit.data import DataAugmenter
|
|
53
|
+
import albumentations as A
|
|
54
|
+
|
|
55
|
+
augmenter = DataAugmenter(data_type="image")
|
|
56
|
+
|
|
57
|
+
transform = augmenter.create_pipeline([
|
|
58
|
+
A.RandomRotate90(p=0.5),
|
|
59
|
+
A.Flip(p=0.5),
|
|
60
|
+
A.RandomBrightnessContrast(p=0.3),
|
|
61
|
+
A.GaussNoise(var_limit=(10, 50), p=0.2),
|
|
62
|
+
A.CoarseDropout(max_holes=8, max_height=32, max_width=32, p=0.3)
|
|
63
|
+
])
|
|
64
|
+
|
|
65
|
+
augmented_data = augmenter.augment(
|
|
66
|
+
data_path="data/processed/train_images/",
|
|
67
|
+
transform=transform,
|
|
68
|
+
factor=3.0,
|
|
69
|
+
output_path="data/augmented/train_images/"
|
|
70
|
+
)
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Best Practices
|
|
74
|
+
- Preserve label integrity
|
|
75
|
+
- Match augmentation to task
|
|
76
|
+
- Balance augmentation strength
|
|
77
|
+
- Validate augmented samples
|
|
78
|
+
|
|
79
|
+
## Progress
|
|
80
|
+
- [ ] Data loaded
|
|
81
|
+
- [ ] Techniques configured
|
|
82
|
+
- [ ] Augmentation applied
|
|
83
|
+
- [ ] Quality validated
|
|
84
|
+
- [ ] Output saved
|
|
85
|
+
|
|
86
|
+
Increase training data diversity while maintaining quality.
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Collect data from multiple sources including databases, APIs, files, and streaming systems
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <source_type> [--config <config_file>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Data Collection: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Collect data from: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **data-engineer-agent** for data collection operations.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **source_type**: database | api | files | streaming | scraping
|
|
16
|
+
- **config**: Path to collection configuration file
|
|
17
|
+
|
|
18
|
+
## Supported Sources
|
|
19
|
+
|
|
20
|
+
### Database
|
|
21
|
+
- PostgreSQL, MySQL, MongoDB
|
|
22
|
+
- Connection pooling
|
|
23
|
+
- Query optimization
|
|
24
|
+
- Batch extraction
|
|
25
|
+
|
|
26
|
+
### API
|
|
27
|
+
- REST/GraphQL endpoints
|
|
28
|
+
- Authentication handling
|
|
29
|
+
- Rate limiting
|
|
30
|
+
- Pagination support
|
|
31
|
+
|
|
32
|
+
### Files
|
|
33
|
+
- CSV, Parquet, JSON, Excel
|
|
34
|
+
- S3, GCS, Azure Blob
|
|
35
|
+
- Local filesystem
|
|
36
|
+
- Compressed archives
|
|
37
|
+
|
|
38
|
+
### Streaming
|
|
39
|
+
- Kafka consumers
|
|
40
|
+
- Kinesis streams
|
|
41
|
+
- Real-time ingestion
|
|
42
|
+
|
|
43
|
+
## Code Template
|
|
44
|
+
```python
|
|
45
|
+
from omgkit.data import DataCollector
|
|
46
|
+
|
|
47
|
+
collector = DataCollector(config="config/data_config.yaml")
|
|
48
|
+
|
|
49
|
+
data = collector.collect(
|
|
50
|
+
sources=[
|
|
51
|
+
{"type": "database", "connection": "postgres://..."},
|
|
52
|
+
{"type": "api", "endpoint": "https://api.example.com/data"},
|
|
53
|
+
{"type": "files", "path": "data/raw/*.csv"}
|
|
54
|
+
],
|
|
55
|
+
validation_schema="schemas/raw_data_schema.json"
|
|
56
|
+
)
|
|
57
|
+
|
|
58
|
+
collector.save(
|
|
59
|
+
data=data,
|
|
60
|
+
output_path="data/raw/collected_data.parquet",
|
|
61
|
+
metadata=True
|
|
62
|
+
)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Actions
|
|
66
|
+
1. Read collection configuration
|
|
67
|
+
2. Connect to data sources
|
|
68
|
+
3. Validate schemas during collection
|
|
69
|
+
4. Apply data transformations
|
|
70
|
+
5. Save to raw data directory
|
|
71
|
+
6. Update data manifest
|
|
72
|
+
|
|
73
|
+
## Progress
|
|
74
|
+
- [ ] Config loaded
|
|
75
|
+
- [ ] Sources connected
|
|
76
|
+
- [ ] Data collected
|
|
77
|
+
- [ ] Schema validated
|
|
78
|
+
- [ ] Data saved
|
|
79
|
+
- [ ] Manifest updated
|
|
80
|
+
|
|
81
|
+
Track all data lineage for reproducibility.
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Label data using manual annotation, weak supervision, active learning, or automated labeling
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <strategy> [--config <config>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Data Labeling: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Label data using: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **data-engineer-agent** for data labeling workflows.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **strategy**: manual | weak_supervision | active_learning | auto_label
|
|
16
|
+
- **config**: Path to labeling configuration
|
|
17
|
+
|
|
18
|
+
## Labeling Strategies
|
|
19
|
+
|
|
20
|
+
### Manual
|
|
21
|
+
- Export to Label Studio format
|
|
22
|
+
- Human annotation interface
|
|
23
|
+
- Quality assurance workflows
|
|
24
|
+
- Inter-annotator agreement
|
|
25
|
+
|
|
26
|
+
### Weak Supervision
|
|
27
|
+
- Labeling functions with Snorkel
|
|
28
|
+
- Programmatic labeling rules
|
|
29
|
+
- Probabilistic label model
|
|
30
|
+
- Noise-aware training
|
|
31
|
+
|
|
32
|
+
### Active Learning
|
|
33
|
+
- Uncertainty sampling
|
|
34
|
+
- Query-by-committee
|
|
35
|
+
- Prioritized labeling queue
|
|
36
|
+
- Model-in-the-loop
|
|
37
|
+
|
|
38
|
+
### Auto Label
|
|
39
|
+
- Pretrained model inference
|
|
40
|
+
- Confidence thresholding
|
|
41
|
+
- Human review for low confidence
|
|
42
|
+
- Continuous improvement
|
|
43
|
+
|
|
44
|
+
## Code Template
|
|
45
|
+
```python
|
|
46
|
+
from omgkit.data import DataLabeler
|
|
47
|
+
|
|
48
|
+
labeler = DataLabeler(strategy="weak_supervision")
|
|
49
|
+
|
|
50
|
+
@labeler.labeling_function()
|
|
51
|
+
def lf_keyword_spam(x):
|
|
52
|
+
spam_keywords = ["free", "win", "click here", "urgent"]
|
|
53
|
+
if any(kw in x.text.lower() for kw in spam_keywords):
|
|
54
|
+
return "SPAM"
|
|
55
|
+
return "ABSTAIN"
|
|
56
|
+
|
|
57
|
+
@labeler.labeling_function()
|
|
58
|
+
def lf_length(x):
|
|
59
|
+
if len(x.text) < 10:
|
|
60
|
+
return "SPAM"
|
|
61
|
+
return "ABSTAIN"
|
|
62
|
+
|
|
63
|
+
labels = labeler.label(
|
|
64
|
+
data_path="data/raw/emails.parquet",
|
|
65
|
+
labeling_functions=[lf_keyword_spam, lf_length],
|
|
66
|
+
output_path="data/labeled/emails_labeled.parquet"
|
|
67
|
+
)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Metrics
|
|
71
|
+
- Label coverage
|
|
72
|
+
- Label accuracy (if ground truth available)
|
|
73
|
+
- Annotator agreement
|
|
74
|
+
- Model confidence distribution
|
|
75
|
+
|
|
76
|
+
## Progress
|
|
77
|
+
- [ ] Strategy configured
|
|
78
|
+
- [ ] Labeling functions defined
|
|
79
|
+
- [ ] Labels generated
|
|
80
|
+
- [ ] Quality assessed
|
|
81
|
+
- [ ] Output saved
|
|
82
|
+
|
|
83
|
+
Maximize label quality with minimal manual effort.
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Split data into train/validation/test sets with stratified, temporal, or group-based strategies
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: "[--strategy <strategy>] [--ratios <ratios>]"
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Data Split: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Split data: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **data-engineer-agent** for data splitting.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **strategy**: random | stratified | temporal | group (default: stratified)
|
|
16
|
+
- **ratios**: Train/Val/Test ratios (default: "0.7,0.15,0.15")
|
|
17
|
+
|
|
18
|
+
## Splitting Strategies
|
|
19
|
+
|
|
20
|
+
### Random
|
|
21
|
+
- Simple random sampling
|
|
22
|
+
- Use case: IID data, no dependencies
|
|
23
|
+
- Fastest approach
|
|
24
|
+
|
|
25
|
+
### Stratified
|
|
26
|
+
- Maintains class distribution
|
|
27
|
+
- Use case: Classification with imbalanced classes
|
|
28
|
+
- Ensures representative splits
|
|
29
|
+
|
|
30
|
+
### Temporal
|
|
31
|
+
- Split by time (no future leakage)
|
|
32
|
+
- Use case: Time series, forecasting
|
|
33
|
+
- Respects temporal order
|
|
34
|
+
|
|
35
|
+
### Group
|
|
36
|
+
- Keep groups together (e.g., same user)
|
|
37
|
+
- Use case: User-level data
|
|
38
|
+
- Prevents data leakage
|
|
39
|
+
|
|
40
|
+
## Code Template
|
|
41
|
+
```python
|
|
42
|
+
from omgkit.data import DataSplitter
|
|
43
|
+
|
|
44
|
+
splitter = DataSplitter()
|
|
45
|
+
|
|
46
|
+
# Stratified split
|
|
47
|
+
train, val, test = splitter.split(
|
|
48
|
+
data_path="data/processed/dataset.parquet",
|
|
49
|
+
strategy="stratified",
|
|
50
|
+
stratify_column="label",
|
|
51
|
+
ratios=[0.7, 0.15, 0.15],
|
|
52
|
+
random_state=42
|
|
53
|
+
)
|
|
54
|
+
|
|
55
|
+
# Temporal split (for time series)
|
|
56
|
+
train, val, test = splitter.split(
|
|
57
|
+
data_path="data/processed/transactions.parquet",
|
|
58
|
+
strategy="temporal",
|
|
59
|
+
time_column="timestamp",
|
|
60
|
+
train_end="2023-10-01",
|
|
61
|
+
val_end="2023-11-01"
|
|
62
|
+
)
|
|
63
|
+
|
|
64
|
+
splitter.save_splits(
|
|
65
|
+
train=train, val=val, test=test,
|
|
66
|
+
output_dir="data/splits/"
|
|
67
|
+
)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Validation
|
|
71
|
+
- No data leakage between splits
|
|
72
|
+
- Class distribution preserved
|
|
73
|
+
- Temporal order respected
|
|
74
|
+
- Group integrity maintained
|
|
75
|
+
|
|
76
|
+
## Progress
|
|
77
|
+
- [ ] Strategy selected
|
|
78
|
+
- [ ] Data loaded
|
|
79
|
+
- [ ] Split executed
|
|
80
|
+
- [ ] Validation passed
|
|
81
|
+
- [ ] Splits saved
|
|
82
|
+
|
|
83
|
+
Prevent data leakage for valid model evaluation.
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Validate data quality with schema checks, null analysis, range validation, and distribution analysis
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <data_path> [--schema <schema>] [--rules <rules>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Data Validation: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Validate data at: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **data-engineer-agent** for comprehensive data validation.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **data_path**: Path to data file or directory
|
|
16
|
+
- **schema**: Path to schema definition (optional)
|
|
17
|
+
- **rules**: Path to validation rules (optional)
|
|
18
|
+
|
|
19
|
+
## Validation Types
|
|
20
|
+
|
|
21
|
+
### Schema Validation
|
|
22
|
+
- Column names match expected
|
|
23
|
+
- Data types correct
|
|
24
|
+
- Required columns present
|
|
25
|
+
- Nullable constraints
|
|
26
|
+
|
|
27
|
+
### Quality Validation
|
|
28
|
+
- Null percentage thresholds
|
|
29
|
+
- Unique constraints
|
|
30
|
+
- Value range validation
|
|
31
|
+
- Duplicate detection
|
|
32
|
+
|
|
33
|
+
### Statistical Validation
|
|
34
|
+
- Distribution drift detection
|
|
35
|
+
- Outlier identification
|
|
36
|
+
- Correlation analysis
|
|
37
|
+
- Cardinality checks
|
|
38
|
+
|
|
39
|
+
## Code Template
|
|
40
|
+
```python
|
|
41
|
+
from omgkit.data import DataValidator
|
|
42
|
+
import great_expectations as ge
|
|
43
|
+
|
|
44
|
+
validator = DataValidator()
|
|
45
|
+
|
|
46
|
+
expectations = validator.create_expectations([
|
|
47
|
+
{"column": "user_id", "check": "not_null"},
|
|
48
|
+
{"column": "user_id", "check": "unique"},
|
|
49
|
+
{"column": "age", "check": "between", "min": 0, "max": 150},
|
|
50
|
+
{"column": "email", "check": "regex", "pattern": r"^[\w\.-]+@[\w\.-]+\.\w+$"},
|
|
51
|
+
{"column": "signup_date", "check": "date_format", "format": "%Y-%m-%d"}
|
|
52
|
+
])
|
|
53
|
+
|
|
54
|
+
results = validator.validate(
|
|
55
|
+
data_path="data/raw/users.parquet",
|
|
56
|
+
expectations=expectations
|
|
57
|
+
)
|
|
58
|
+
|
|
59
|
+
validator.generate_report(results, output="reports/validation_report.html")
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Output Report
|
|
63
|
+
- Total checks executed
|
|
64
|
+
- Passed/Failed/Warning counts
|
|
65
|
+
- Per-column statistics
|
|
66
|
+
- Recommended actions
|
|
67
|
+
- Data quality score
|
|
68
|
+
|
|
69
|
+
## Progress
|
|
70
|
+
- [ ] Data loaded
|
|
71
|
+
- [ ] Schema validated
|
|
72
|
+
- [ ] Quality checks run
|
|
73
|
+
- [ ] Statistics computed
|
|
74
|
+
- [ ] Report generated
|
|
75
|
+
|
|
76
|
+
Fail fast on critical violations, warn on soft issues.
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Version data with DVC for reproducibility - commit, checkout, diff, and history operations
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <action> [--message <message>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Data Version: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Version data: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **data-engineer-agent** for data versioning with DVC.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **action**: commit | checkout | diff | history
|
|
16
|
+
- **message**: Version commit message
|
|
17
|
+
|
|
18
|
+
## Actions
|
|
19
|
+
|
|
20
|
+
### Commit
|
|
21
|
+
- Track changes with DVC
|
|
22
|
+
- Create git commit with data hash
|
|
23
|
+
- Push to remote storage
|
|
24
|
+
- Tag version
|
|
25
|
+
|
|
26
|
+
### Checkout
|
|
27
|
+
- Restore specific data version
|
|
28
|
+
- Download from remote storage
|
|
29
|
+
- Verify data integrity
|
|
30
|
+
|
|
31
|
+
### Diff
|
|
32
|
+
- Compare data versions
|
|
33
|
+
- Show added/removed files
|
|
34
|
+
- Statistical differences
|
|
35
|
+
|
|
36
|
+
### History
|
|
37
|
+
- Show version history
|
|
38
|
+
- List all data commits
|
|
39
|
+
- Display metadata
|
|
40
|
+
|
|
41
|
+
## Code Template
|
|
42
|
+
```python
|
|
43
|
+
from omgkit.data import DataVersioner
|
|
44
|
+
|
|
45
|
+
versioner = DataVersioner()
|
|
46
|
+
|
|
47
|
+
# Commit new version
|
|
48
|
+
versioner.commit(
|
|
49
|
+
data_paths=["data/processed/"],
|
|
50
|
+
message="Added augmented training data v1.2",
|
|
51
|
+
tags=["training", "augmented"]
|
|
52
|
+
)
|
|
53
|
+
|
|
54
|
+
# Checkout specific version
|
|
55
|
+
versioner.checkout(version="v1.1.0")
|
|
56
|
+
|
|
57
|
+
# Compare versions
|
|
58
|
+
diff = versioner.diff(v1="v1.0.0", v2="v1.2.0")
|
|
59
|
+
print(diff)
|
|
60
|
+
|
|
61
|
+
# Show history
|
|
62
|
+
history = versioner.history()
|
|
63
|
+
for version in history:
|
|
64
|
+
print(f"{version.tag}: {version.message} ({version.date})")
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## Best Practices
|
|
68
|
+
- Version data with each significant change
|
|
69
|
+
- Include meaningful commit messages
|
|
70
|
+
- Tag releases for reproducibility
|
|
71
|
+
- Link data versions to model versions
|
|
72
|
+
|
|
73
|
+
## Remote Storage
|
|
74
|
+
- S3, GCS, Azure Blob
|
|
75
|
+
- SSH/SFTP servers
|
|
76
|
+
- Local network storage
|
|
77
|
+
|
|
78
|
+
## Progress
|
|
79
|
+
- [ ] Action validated
|
|
80
|
+
- [ ] DVC operation executed
|
|
81
|
+
- [ ] Remote synced
|
|
82
|
+
- [ ] Git updated
|
|
83
|
+
- [ ] Verification complete
|
|
84
|
+
|
|
85
|
+
Enable full data reproducibility and lineage tracking.
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Setup A/B testing between model versions with traffic splitting and statistical analysis
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: "[--models <models>] [--traffic_split <split>]"
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# A/B Testing: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Setup A/B test: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **deployment-agent** for A/B testing configuration.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **models**: List of model versions to test
|
|
16
|
+
- **traffic_split**: Traffic distribution percentages
|
|
17
|
+
|
|
18
|
+
## A/B Test Components
|
|
19
|
+
|
|
20
|
+
### Traffic Splitting
|
|
21
|
+
- Percentage-based routing
|
|
22
|
+
- User-based stickiness
|
|
23
|
+
- Geographic routing
|
|
24
|
+
- Feature-based routing
|
|
25
|
+
|
|
26
|
+
### Metrics Collection
|
|
27
|
+
- Primary metrics (accuracy, latency)
|
|
28
|
+
- Business metrics (conversion, revenue)
|
|
29
|
+
- User experience metrics
|
|
30
|
+
- System metrics
|
|
31
|
+
|
|
32
|
+
### Statistical Analysis
|
|
33
|
+
- Significance testing
|
|
34
|
+
- Confidence intervals
|
|
35
|
+
- Sample size calculation
|
|
36
|
+
- Stopping rules
|
|
37
|
+
|
|
38
|
+
## Code Template
|
|
39
|
+
```python
|
|
40
|
+
from omgkit.deployment import ABTester
|
|
41
|
+
|
|
42
|
+
ab_tester = ABTester()
|
|
43
|
+
|
|
44
|
+
experiment = ab_tester.create_experiment(
|
|
45
|
+
name="churn_model_v2_test",
|
|
46
|
+
models={
|
|
47
|
+
"control": "models/v1.0.0/model.pt",
|
|
48
|
+
"treatment": "models/v2.0.0/model.pt"
|
|
49
|
+
},
|
|
50
|
+
traffic_split={
|
|
51
|
+
"control": 0.8,
|
|
52
|
+
"treatment": 0.2
|
|
53
|
+
},
|
|
54
|
+
metrics=["accuracy", "latency", "conversion_rate"],
|
|
55
|
+
min_samples=10000,
|
|
56
|
+
significance_level=0.05
|
|
57
|
+
)
|
|
58
|
+
|
|
59
|
+
# Monitor experiment
|
|
60
|
+
ab_tester.monitor(experiment)
|
|
61
|
+
|
|
62
|
+
# Analyze results
|
|
63
|
+
results = ab_tester.analyze(experiment)
|
|
64
|
+
|
|
65
|
+
if results.is_significant and results.treatment_better:
|
|
66
|
+
ab_tester.promote("treatment")
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Experiment Types
|
|
70
|
+
- A/B (two variants)
|
|
71
|
+
- A/B/n (multiple variants)
|
|
72
|
+
- Multi-armed bandit
|
|
73
|
+
- Shadow mode
|
|
74
|
+
|
|
75
|
+
## Safety Features
|
|
76
|
+
- Automatic rollback
|
|
77
|
+
- Circuit breaker
|
|
78
|
+
- Error rate monitoring
|
|
79
|
+
- Latency guardrails
|
|
80
|
+
|
|
81
|
+
## Output Report
|
|
82
|
+
- Statistical significance
|
|
83
|
+
- Effect size
|
|
84
|
+
- Confidence intervals
|
|
85
|
+
- Recommendation
|
|
86
|
+
|
|
87
|
+
## Progress
|
|
88
|
+
- [ ] Experiment configured
|
|
89
|
+
- [ ] Traffic routing active
|
|
90
|
+
- [ ] Metrics collecting
|
|
91
|
+
- [ ] Analysis running
|
|
92
|
+
- [ ] Decision ready
|
|
93
|
+
|
|
94
|
+
Make data-driven model rollout decisions.
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Deploy model to cloud platforms with auto-scaling, monitoring, and production configurations
|
|
3
|
+
allowed-tools: Task, Read, Write, Bash, Grep, Glob
|
|
4
|
+
argument-hint: <provider> [--config <config>]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Cloud Deployment: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Deploy to cloud: **$ARGUMENTS**
|
|
10
|
+
|
|
11
|
+
## Agent
|
|
12
|
+
Uses **deployment-agent** for cloud deployment.
|
|
13
|
+
|
|
14
|
+
## Parameters
|
|
15
|
+
- **provider**: aws | gcp | azure
|
|
16
|
+
- **config**: Path to deployment configuration
|
|
17
|
+
|
|
18
|
+
## Cloud Providers
|
|
19
|
+
|
|
20
|
+
### AWS
|
|
21
|
+
- SageMaker endpoints
|
|
22
|
+
- Lambda (serverless)
|
|
23
|
+
- ECS/EKS containers
|
|
24
|
+
- Elastic Inference
|
|
25
|
+
|
|
26
|
+
### GCP
|
|
27
|
+
- Vertex AI endpoints
|
|
28
|
+
- Cloud Run
|
|
29
|
+
- GKE containers
|
|
30
|
+
- TPU support
|
|
31
|
+
|
|
32
|
+
### Azure
|
|
33
|
+
- Azure ML endpoints
|
|
34
|
+
- Azure Functions
|
|
35
|
+
- AKS containers
|
|
36
|
+
- Cognitive Services
|
|
37
|
+
|
|
38
|
+
## Code Template
|
|
39
|
+
```python
|
|
40
|
+
from omgkit.deployment import CloudDeployer
|
|
41
|
+
|
|
42
|
+
deployer = CloudDeployer(provider="aws")
|
|
43
|
+
|
|
44
|
+
endpoint = deployer.deploy(
|
|
45
|
+
model_path="models/best_model.pt",
|
|
46
|
+
config={
|
|
47
|
+
"instance_type": "ml.m5.xlarge",
|
|
48
|
+
"instance_count": 2,
|
|
49
|
+
"autoscaling": {
|
|
50
|
+
"min_capacity": 1,
|
|
51
|
+
"max_capacity": 10,
|
|
52
|
+
"target_invocations": 1000
|
|
53
|
+
},
|
|
54
|
+
"monitoring": {
|
|
55
|
+
"data_capture": True,
|
|
56
|
+
"capture_percentage": 10
|
|
57
|
+
}
|
|
58
|
+
},
|
|
59
|
+
endpoint_name="churn-predictor-prod"
|
|
60
|
+
)
|
|
61
|
+
|
|
62
|
+
# Test endpoint
|
|
63
|
+
response = deployer.invoke(
|
|
64
|
+
endpoint_name="churn-predictor-prod",
|
|
65
|
+
payload={"features": [1.0, 2.0, 3.0]}
|
|
66
|
+
)
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Features
|
|
70
|
+
- Auto-scaling policies
|
|
71
|
+
- Blue-green deployment
|
|
72
|
+
- A/B testing
|
|
73
|
+
- Monitoring & logging
|
|
74
|
+
- Cost optimization
|
|
75
|
+
|
|
76
|
+
## Infrastructure as Code
|
|
77
|
+
- Terraform templates
|
|
78
|
+
- CloudFormation
|
|
79
|
+
- Pulumi scripts
|
|
80
|
+
- CDK support
|
|
81
|
+
|
|
82
|
+
## Progress
|
|
83
|
+
- [ ] Credentials validated
|
|
84
|
+
- [ ] Resources provisioned
|
|
85
|
+
- [ ] Model deployed
|
|
86
|
+
- [ ] Health verified
|
|
87
|
+
- [ ] Monitoring active
|
|
88
|
+
|
|
89
|
+
Deploy production-grade ML endpoints to cloud.
|