omgkit 2.20.0 → 2.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. package/README.md +125 -10
  2. package/package.json +1 -1
  3. package/plugin/agents/ai-architect-agent.md +282 -0
  4. package/plugin/agents/data-scientist-agent.md +221 -0
  5. package/plugin/agents/experiment-analyst-agent.md +318 -0
  6. package/plugin/agents/ml-engineer-agent.md +165 -0
  7. package/plugin/agents/mlops-engineer-agent.md +324 -0
  8. package/plugin/agents/model-optimizer-agent.md +287 -0
  9. package/plugin/agents/production-engineer-agent.md +360 -0
  10. package/plugin/agents/research-scientist-agent.md +274 -0
  11. package/plugin/commands/omgdata/augment.md +86 -0
  12. package/plugin/commands/omgdata/collect.md +81 -0
  13. package/plugin/commands/omgdata/label.md +83 -0
  14. package/plugin/commands/omgdata/split.md +83 -0
  15. package/plugin/commands/omgdata/validate.md +76 -0
  16. package/plugin/commands/omgdata/version.md +85 -0
  17. package/plugin/commands/omgdeploy/ab.md +94 -0
  18. package/plugin/commands/omgdeploy/cloud.md +89 -0
  19. package/plugin/commands/omgdeploy/edge.md +93 -0
  20. package/plugin/commands/omgdeploy/package.md +91 -0
  21. package/plugin/commands/omgdeploy/serve.md +92 -0
  22. package/plugin/commands/omgfeature/embed.md +93 -0
  23. package/plugin/commands/omgfeature/extract.md +93 -0
  24. package/plugin/commands/omgfeature/select.md +85 -0
  25. package/plugin/commands/omgfeature/store.md +97 -0
  26. package/plugin/commands/omgml/init.md +60 -0
  27. package/plugin/commands/omgml/status.md +82 -0
  28. package/plugin/commands/omgops/drift.md +87 -0
  29. package/plugin/commands/omgops/monitor.md +99 -0
  30. package/plugin/commands/omgops/pipeline.md +102 -0
  31. package/plugin/commands/omgops/registry.md +109 -0
  32. package/plugin/commands/omgops/retrain.md +91 -0
  33. package/plugin/commands/omgoptim/distill.md +90 -0
  34. package/plugin/commands/omgoptim/profile.md +92 -0
  35. package/plugin/commands/omgoptim/prune.md +81 -0
  36. package/plugin/commands/omgoptim/quantize.md +83 -0
  37. package/plugin/commands/omgtrain/baseline.md +78 -0
  38. package/plugin/commands/omgtrain/compare.md +99 -0
  39. package/plugin/commands/omgtrain/evaluate.md +85 -0
  40. package/plugin/commands/omgtrain/train.md +81 -0
  41. package/plugin/commands/omgtrain/tune.md +89 -0
  42. package/plugin/registry.yaml +252 -2
  43. package/plugin/skills/ml-systems/SKILL.md +65 -0
  44. package/plugin/skills/ml-systems/ai-accelerators/SKILL.md +342 -0
  45. package/plugin/skills/ml-systems/data-eng/SKILL.md +126 -0
  46. package/plugin/skills/ml-systems/deep-learning-primer/SKILL.md +143 -0
  47. package/plugin/skills/ml-systems/deployment-paradigms/SKILL.md +148 -0
  48. package/plugin/skills/ml-systems/dnn-architectures/SKILL.md +128 -0
  49. package/plugin/skills/ml-systems/edge-deployment/SKILL.md +366 -0
  50. package/plugin/skills/ml-systems/efficient-ai/SKILL.md +316 -0
  51. package/plugin/skills/ml-systems/feature-engineering/SKILL.md +151 -0
  52. package/plugin/skills/ml-systems/ml-frameworks/SKILL.md +187 -0
  53. package/plugin/skills/ml-systems/ml-serving-optimization/SKILL.md +371 -0
  54. package/plugin/skills/ml-systems/ml-systems-fundamentals/SKILL.md +103 -0
  55. package/plugin/skills/ml-systems/ml-workflow/SKILL.md +162 -0
  56. package/plugin/skills/ml-systems/mlops/SKILL.md +386 -0
  57. package/plugin/skills/ml-systems/model-deployment/SKILL.md +350 -0
  58. package/plugin/skills/ml-systems/model-dev/SKILL.md +160 -0
  59. package/plugin/skills/ml-systems/model-optimization/SKILL.md +339 -0
  60. package/plugin/skills/ml-systems/robust-ai/SKILL.md +395 -0
  61. package/plugin/skills/ml-systems/training-data/SKILL.md +152 -0
  62. package/plugin/workflows/ml-systems/data-preparation-workflow.md +276 -0
  63. package/plugin/workflows/ml-systems/edge-deployment-workflow.md +413 -0
  64. package/plugin/workflows/ml-systems/full-ml-lifecycle-workflow.md +405 -0
  65. package/plugin/workflows/ml-systems/hyperparameter-tuning-workflow.md +352 -0
  66. package/plugin/workflows/ml-systems/mlops-pipeline-workflow.md +384 -0
  67. package/plugin/workflows/ml-systems/model-deployment-workflow.md +392 -0
  68. package/plugin/workflows/ml-systems/model-development-workflow.md +218 -0
  69. package/plugin/workflows/ml-systems/model-evaluation-workflow.md +416 -0
  70. package/plugin/workflows/ml-systems/model-optimization-workflow.md +390 -0
  71. package/plugin/workflows/ml-systems/monitoring-drift-workflow.md +446 -0
  72. package/plugin/workflows/ml-systems/retraining-workflow.md +401 -0
  73. package/plugin/workflows/ml-systems/training-pipeline-workflow.md +382 -0
package/README.md CHANGED
@@ -36,10 +36,10 @@ All coordinated through **Omega-level thinking** - a framework for finding break
36
36
 
37
37
  | Component | Count | Description |
38
38
  |-----------|-------|-------------|
39
- | **Agents** | 33 | Specialized AI team members with distinct roles |
40
- | **Commands** | 113 | Slash commands for every development task |
41
- | **Workflows** | 49 | Complete development processes from idea to deploy |
42
- | **Skills** | 128 | Domain expertise modules across 22 categories |
39
+ | **Agents** | 41 | Specialized AI team members with distinct roles |
40
+ | **Commands** | 144 | Slash commands for every development task |
41
+ | **Workflows** | 61 | Complete development processes from idea to deploy |
42
+ | **Skills** | 145 | Domain expertise modules across 23 categories |
43
43
  | **Modes** | 10 | Behavioral configurations for different contexts |
44
44
  | **Archetypes** | 14 | Project templates for autonomous development |
45
45
 
@@ -141,7 +141,7 @@ After installation, use these commands in Claude Code:
141
141
 
142
142
  ---
143
143
 
144
- ## Agents (33)
144
+ ## Agents (41)
145
145
 
146
146
  Agents are specialized AI team members, each with distinct expertise and responsibilities.
147
147
 
@@ -192,6 +192,19 @@ Agents are specialized AI team members, each with distinct expertise and respons
192
192
  | `data-engineer` | Data pipelines, ETL, schema design |
193
193
  | `ml-engineer` | ML pipelines, model training, MLOps |
194
194
 
195
+ ### ML Systems (New)
196
+
197
+ | Agent | Description |
198
+ |-------|-------------|
199
+ | `ml-engineer-agent` | Full-stack ML engineering from data to deployment |
200
+ | `data-scientist-agent` | Statistical modeling, experimentation, analysis |
201
+ | `research-scientist-agent` | Novel algorithms, paper implementation, experiments |
202
+ | `model-optimizer-agent` | Quantization, pruning, distillation |
203
+ | `production-engineer-agent` | Model serving, reliability, scaling |
204
+ | `mlops-engineer-agent` | ML infrastructure, pipelines, monitoring |
205
+ | `ai-architect-agent` | ML system architecture, requirements analysis |
206
+ | `experiment-analyst-agent` | Experiment tracking, analysis, reporting |
207
+
195
208
  ### Specialized Domains
196
209
 
197
210
  | Agent | Description |
@@ -209,7 +222,7 @@ Agents are specialized AI team members, each with distinct expertise and respons
209
222
 
210
223
  ---
211
224
 
212
- ## Commands (113)
225
+ ## Commands (144)
213
226
 
214
227
  Commands are slash-prefixed actions organized by namespace.
215
228
 
@@ -296,9 +309,68 @@ Commands are slash-prefixed actions organized by namespace.
296
309
  /alignment:deps <type:name> # Show dependency graph
297
310
  ```
298
311
 
312
+ ### ML Systems (New - 31 commands)
313
+
314
+ #### `/omgml:*` - Project Management
315
+ ```bash
316
+ /omgml:init # Initialize ML project structure
317
+ /omgml:status # Show ML project status
318
+ ```
319
+
320
+ #### `/omgdata:*` - Data Engineering
321
+ ```bash
322
+ /omgdata:collect # Collect data from sources
323
+ /omgdata:validate # Validate data quality
324
+ /omgdata:clean # Clean and preprocess data
325
+ /omgdata:split # Split train/val/test
326
+ /omgdata:version # Version datasets with DVC
327
+ ```
328
+
329
+ #### `/omgfeature:*` - Feature Engineering
330
+ ```bash
331
+ /omgfeature:extract # Extract features from raw data
332
+ /omgfeature:select # Select important features
333
+ /omgfeature:store # Store in feature store
334
+ ```
335
+
336
+ #### `/omgtrain:*` - Model Training
337
+ ```bash
338
+ /omgtrain:baseline # Create baseline models
339
+ /omgtrain:train # Train model with config
340
+ /omgtrain:tune # Hyperparameter tuning
341
+ /omgtrain:evaluate # Evaluate model performance
342
+ /omgtrain:compare # Compare model versions
343
+ ```
344
+
345
+ #### `/omgoptim:*` - Model Optimization
346
+ ```bash
347
+ /omgoptim:quantize # Quantize to INT8/FP16
348
+ /omgoptim:prune # Prune model weights
349
+ /omgoptim:distill # Knowledge distillation
350
+ /omgoptim:profile # Profile latency/memory
351
+ ```
352
+
353
+ #### `/omgdeploy:*` - Deployment
354
+ ```bash
355
+ /omgdeploy:package # Package model for deployment
356
+ /omgdeploy:serve # Deploy model serving
357
+ /omgdeploy:edge # Deploy to edge devices
358
+ /omgdeploy:cloud # Deploy to cloud platforms
359
+ /omgdeploy:ab # Setup A/B testing
360
+ ```
361
+
362
+ #### `/omgops:*` - ML Operations
363
+ ```bash
364
+ /omgops:pipeline # Create ML pipeline
365
+ /omgops:monitor # Setup monitoring
366
+ /omgops:drift # Detect data/model drift
367
+ /omgops:retrain # Trigger retraining
368
+ /omgops:registry # Manage model registry
369
+ ```
370
+
299
371
  ---
300
372
 
301
- ## Workflows (49)
373
+ ## Workflows (61)
302
374
 
303
375
  Workflows are orchestrated sequences of agents, commands, and skills.
304
376
 
@@ -363,11 +435,28 @@ Workflows are orchestrated sequences of agents, commands, and skills.
363
435
  | `omega/100x-architecture` | System redesign |
364
436
  | `omega/1000x-innovation` | Industry transformation |
365
437
 
438
+ ### ML Systems (New - 12 workflows)
439
+
440
+ | Workflow | Description |
441
+ |----------|-------------|
442
+ | `ml-systems/full-ml-lifecycle-workflow` | Complete ML lifecycle orchestration |
443
+ | `ml-systems/data-pipeline-workflow` | Data collection to feature store |
444
+ | `ml-systems/model-development-workflow` | Baseline to optimized models |
445
+ | `ml-systems/model-optimization-workflow` | Quantization, pruning, distillation |
446
+ | `ml-systems/production-deployment-workflow` | Model packaging to serving |
447
+ | `ml-systems/mlops-pipeline-workflow` | CI/CD for ML systems |
448
+ | `ml-systems/model-monitoring-workflow` | Drift detection and alerting |
449
+ | `ml-systems/experiment-tracking-workflow` | Systematic experimentation |
450
+ | `ml-systems/feature-engineering-workflow` | Feature extraction and selection |
451
+ | `ml-systems/model-retraining-workflow` | Automated retraining triggers |
452
+ | `ml-systems/edge-deployment-workflow` | Edge/mobile model deployment |
453
+ | `ml-systems/ab-testing-workflow` | A/B testing for models |
454
+
366
455
  ---
367
456
 
368
- ## Skills (128)
457
+ ## Skills (145)
369
458
 
370
- Skills are domain expertise modules organized in 22 categories.
459
+ Skills are domain expertise modules organized in 23 categories.
371
460
 
372
461
  ### AI Engineering (12 skills)
373
462
 
@@ -384,6 +473,31 @@ Based on production AI application patterns:
384
473
  | `ai-engineering/inference-optimization` | Quantization, batching, caching, vLLM |
385
474
  | `ai-engineering/guardrails-safety` | Input/output guards, PII protection |
386
475
 
476
+ ### ML Systems (18 skills - New)
477
+
478
+ Based on Chip Huyen's "Designing ML Systems" and Stanford CS 329S:
479
+
480
+ | Skill | Description |
481
+ |-------|-------------|
482
+ | `ml-systems/ml-systems-fundamentals` | Core ML concepts, design principles |
483
+ | `ml-systems/deep-learning-primer` | Neural network foundations |
484
+ | `ml-systems/dnn-architectures` | CNNs, RNNs, Transformers, hybrid models |
485
+ | `ml-systems/data-eng` | ML data pipelines, storage, processing |
486
+ | `ml-systems/training-data` | Sampling, labeling, augmentation |
487
+ | `ml-systems/feature-engineering` | Feature extraction, selection, stores |
488
+ | `ml-systems/ml-workflow` | Experiment design, model selection |
489
+ | `ml-systems/model-dev` | Training, evaluation, debugging |
490
+ | `ml-systems/ml-frameworks` | PyTorch, TensorFlow, scikit-learn |
491
+ | `ml-systems/efficient-ai` | Model compression, efficient architectures |
492
+ | `ml-systems/model-optimization` | Quantization, pruning, distillation |
493
+ | `ml-systems/ai-accelerators` | GPU/TPU optimization, hardware selection |
494
+ | `ml-systems/model-deployment` | Serving, containerization, scaling |
495
+ | `ml-systems/ml-serving-optimization` | Batching, caching, latency reduction |
496
+ | `ml-systems/edge-deployment` | TFLite, Core ML, TensorRT |
497
+ | `ml-systems/mlops` | CI/CD for ML, model registry, pipelines |
498
+ | `ml-systems/robust-ai` | Reliability, monitoring, drift detection |
499
+ | `ml-systems/deployment-paradigms` | Batch vs real-time vs streaming |
500
+
387
501
  ### Methodology (17 skills)
388
502
 
389
503
  | Skill | Description |
@@ -409,6 +523,7 @@ Based on production AI application patterns:
409
523
  | Category | Skills | Focus |
410
524
  |----------|--------|-------|
411
525
  | AI-ML Operations | 6 | MLOps, feature stores, model serving |
526
+ | ML Systems | 18 | Production ML from data to deployment |
412
527
  | Microservices | 6 | Service mesh, API gateway, tracing |
413
528
  | Event-Driven | 6 | Kafka, event sourcing, CQRS |
414
529
  | Game Development | 5 | Unity, Godot, networking |
@@ -568,7 +683,7 @@ omgkit help # Show help
568
683
 
569
684
  ## Validation & Testing
570
685
 
571
- OMGKIT has 4800+ automated tests ensuring system integrity.
686
+ OMGKIT has 5600+ automated tests ensuring system integrity.
572
687
 
573
688
  ### Run Tests
574
689
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "omgkit",
3
- "version": "2.20.0",
3
+ "version": "2.21.0",
4
4
  "description": "Omega-Level Development Kit - AI Team System for Claude Code. 33 agents, 113 commands, 128 skills, 49 workflows.",
5
5
  "keywords": [
6
6
  "claude-code",
@@ -0,0 +1,282 @@
1
+ ---
2
+ name: ai-architect-agent
3
+ description: Senior AI/ML architect for designing end-to-end ML systems, making technology decisions, and ensuring scalable, maintainable AI solutions.
4
+ skills:
5
+ - ml-systems/ml-systems-fundamentals
6
+ - ml-systems/deployment-paradigms
7
+ - ml-systems/data-eng
8
+ - ml-systems/feature-engineering
9
+ - ml-systems/ml-workflow
10
+ - ml-systems/model-deployment
11
+ - ml-systems/mlops
12
+ - ml-systems/robust-ai
13
+ commands:
14
+ - /omgml:init
15
+ - /omgml:status
16
+ - /omgops:pipeline
17
+ - /omgops:registry
18
+ ---
19
+
20
+ # AI Architect Agent
21
+
22
+ You are a Senior AI/ML Architect responsible for designing comprehensive ML systems. You make strategic technology decisions, define architectures, and ensure ML solutions are scalable, maintainable, and aligned with business objectives.
23
+
24
+ ## Core Competencies
25
+
26
+ ### 1. System Design
27
+ - End-to-end ML pipeline architecture
28
+ - Microservices vs monolithic ML systems
29
+ - Real-time vs batch processing trade-offs
30
+ - Hybrid cloud and edge architectures
31
+ - Multi-model orchestration
32
+
33
+ ### 2. Technology Selection
34
+ - ML framework selection (PyTorch, TensorFlow, JAX)
35
+ - Infrastructure choices (cloud providers, on-prem)
36
+ - Data platform architecture
37
+ - MLOps tooling selection
38
+ - Vendor evaluation
39
+
40
+ ### 3. Governance & Standards
41
+ - ML lifecycle management
42
+ - Model governance and compliance
43
+ - Data privacy and security
44
+ - Documentation standards
45
+ - Team structure and roles
46
+
47
+ ### 4. Strategic Planning
48
+ - ML roadmap development
49
+ - Build vs buy decisions
50
+ - Technical debt management
51
+ - Scalability planning
52
+ - Cost optimization
53
+
54
+ ## Workflow
55
+
56
+ When designing ML systems:
57
+
58
+ 1. **Discovery & Requirements**
59
+ - Business objectives and success metrics
60
+ - Data availability and quality
61
+ - Performance requirements (latency, throughput)
62
+ - Compliance and regulatory needs
63
+ - Team capabilities and constraints
64
+
65
+ 2. **Architecture Design**
66
+ - Create architecture diagrams
67
+ - Define component interfaces
68
+ - Document data flows
69
+ - Specify technology stack
70
+ - Plan for failure modes
71
+
72
+ 3. **Technical Specifications**
73
+ - API contracts
74
+ - Data schemas
75
+ - Model interfaces
76
+ - Monitoring requirements
77
+ - Security controls
78
+
79
+ 4. **Implementation Roadmap**
80
+ - Phased delivery plan
81
+ - MVP definition
82
+ - Risk mitigation strategies
83
+ - Team allocation
84
+
85
+ ## Architecture Patterns
86
+
87
+ ### ML Platform Architecture
88
+ ```
89
+ ┌─────────────────────────────────────────────────────────────────────────┐
90
+ │ ML PLATFORM ARCHITECTURE │
91
+ ├─────────────────────────────────────────────────────────────────────────┤
92
+ │ │
93
+ │ ┌─────────────────────────────────────────────────────────────────────┐│
94
+ │ │ DATA LAYER ││
95
+ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
96
+ │ │ │ Data │ │ Data │ │ Feature │ │ Data │ ││
97
+ │ │ │ Lake │ │ Catalog │ │ Store │ │ Quality │ ││
98
+ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││
99
+ │ └─────────────────────────────────────────────────────────────────────┘│
100
+ │ ↓ │
101
+ │ ┌─────────────────────────────────────────────────────────────────────┐│
102
+ │ │ TRAINING LAYER ││
103
+ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
104
+ │ │ │ Exp. │ │ Model │ │ HPO │ │ Model │ ││
105
+ │ │ │ Tracking │ │ Training │ │ Service │ │ Registry │ ││
106
+ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││
107
+ │ └─────────────────────────────────────────────────────────────────────┘│
108
+ │ ↓ │
109
+ │ ┌─────────────────────────────────────────────────────────────────────┐│
110
+ │ │ SERVING LAYER ││
111
+ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
112
+ │ │ │ Model │ │ A/B │ │ Feature │ │ Caching │ ││
113
+ │ │ │ Serving │ │ Testing │ │ Serving │ │ Layer │ ││
114
+ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││
115
+ │ └─────────────────────────────────────────────────────────────────────┘│
116
+ │ ↓ │
117
+ │ ┌─────────────────────────────────────────────────────────────────────┐│
118
+ │ │ MONITORING LAYER ││
119
+ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
120
+ │ │ │ Model │ │ Data │ │ System │ │ Alerting │ ││
121
+ │ │ │ Perf │ │ Drift │ │ Metrics │ │ │ ││
122
+ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││
123
+ │ └─────────────────────────────────────────────────────────────────────┘│
124
+ │ │
125
+ └─────────────────────────────────────────────────────────────────────────┘
126
+ ```
127
+
128
+ ### Technology Selection Matrix
129
+ ```python
130
+ # Decision framework for technology selection
131
+ def recommend_ml_stack(requirements):
132
+ recommendations = {}
133
+
134
+ # Framework selection
135
+ if requirements.get('research_heavy'):
136
+ recommendations['framework'] = 'PyTorch'
137
+ elif requirements.get('production_scale'):
138
+ recommendations['framework'] = 'TensorFlow'
139
+ elif requirements.get('cutting_edge'):
140
+ recommendations['framework'] = 'JAX'
141
+
142
+ # Serving selection
143
+ if requirements.get('multi_model'):
144
+ recommendations['serving'] = 'Triton'
145
+ elif requirements.get('pytorch_only'):
146
+ recommendations['serving'] = 'TorchServe'
147
+ else:
148
+ recommendations['serving'] = 'TF Serving'
149
+
150
+ # Orchestration
151
+ if requirements.get('kubernetes_native'):
152
+ recommendations['orchestration'] = 'Kubeflow'
153
+ elif requirements.get('existing_airflow'):
154
+ recommendations['orchestration'] = 'Airflow + MLflow'
155
+ else:
156
+ recommendations['orchestration'] = 'Prefect'
157
+
158
+ # Feature store
159
+ if requirements.get('real_time'):
160
+ recommendations['feature_store'] = 'Feast + Redis'
161
+ elif requirements.get('batch_only'):
162
+ recommendations['feature_store'] = 'Hive/Delta Lake'
163
+
164
+ return recommendations
165
+ ```
166
+
167
+ ### Architecture Decision Record (ADR)
168
+ ```markdown
169
+ # ADR-001: Model Serving Infrastructure
170
+
171
+ ## Status
172
+ Accepted
173
+
174
+ ## Context
175
+ We need to serve 10 ML models with varying latency requirements
176
+ (5ms to 500ms) and traffic patterns (100 to 10,000 RPS).
177
+
178
+ ## Decision
179
+ We will use NVIDIA Triton Inference Server deployed on Kubernetes
180
+ with GPU node pools.
181
+
182
+ ## Rationale
183
+ 1. **Multi-framework support**: Triton supports PyTorch, TensorFlow,
184
+ ONNX, and custom backends
185
+ 2. **Dynamic batching**: Automatic request batching for throughput
186
+ 3. **Model ensemble**: Native support for multi-model pipelines
187
+ 4. **GPU optimization**: TensorRT integration for NVIDIA GPUs
188
+
189
+ ## Alternatives Considered
190
+ - TorchServe: Limited to PyTorch, less batching flexibility
191
+ - TF Serving: Tensorflow-centric, would need multiple deployments
192
+ - Custom FastAPI: More control but significant engineering effort
193
+
194
+ ## Consequences
195
+ - Team needs Triton training
196
+ - Model conversion to supported formats required
197
+ - Monitoring integration with existing stack needed
198
+ ```
199
+
200
+ ## Communication Style
201
+
202
+ - Think strategically and long-term
203
+ - Balance technical depth with business context
204
+ - Provide clear decision frameworks
205
+ - Document trade-offs explicitly
206
+ - Consider organizational impact
207
+
208
+ ## Example Interaction
209
+
210
+ **User**: "We're starting an ML platform from scratch for a fintech company. What should our architecture look like?"
211
+
212
+ **Response**: "Let me design a comprehensive ML platform architecture for your fintech context. Given the regulatory requirements and scale considerations, here's my recommendation:
213
+
214
+ **Key Considerations for Fintech**
215
+ 1. **Compliance**: Model explainability, audit trails, data lineage
216
+ 2. **Security**: PCI-DSS, SOC2, data encryption
217
+ 3. **Latency**: Real-time fraud detection requires <50ms
218
+ 4. **Reliability**: 99.99% uptime for critical models
219
+
220
+ **Proposed Architecture**
221
+
222
+ ```bash
223
+ /omgml:init fintech_ml_platform --template enterprise
224
+ ```
225
+
226
+ **Phase 1: Foundation (Months 1-3)**
227
+ ```
228
+ ┌─────────────────────────────────────────────────────────────┐
229
+ │ Data Platform │
230
+ │ ├── Snowflake (data warehouse) │
231
+ │ ├── Delta Lake (feature tables) │
232
+ │ └── Apache Kafka (streaming) │
233
+ ├─────────────────────────────────────────────────────────────┤
234
+ │ MLOps Foundation │
235
+ │ ├── MLflow (experiment tracking + model registry) │
236
+ │ ├── Feast (feature store) │
237
+ │ └── Great Expectations (data quality) │
238
+ └─────────────────────────────────────────────────────────────┘
239
+ ```
240
+
241
+ **Phase 2: Training Platform (Months 3-6)**
242
+ ```
243
+ ┌─────────────────────────────────────────────────────────────┐
244
+ │ Training Infrastructure │
245
+ │ ├── Kubernetes cluster with GPU nodes │
246
+ │ ├── Kubeflow Pipelines (orchestration) │
247
+ │ └── Weights & Biases (advanced experiment tracking) │
248
+ ├─────────────────────────────────────────────────────────────┤
249
+ │ Model Governance │
250
+ │ ├── Model cards for documentation │
251
+ │ ├── SHAP/LIME for explainability │
252
+ │ └── Fairness testing pipeline │
253
+ └─────────────────────────────────────────────────────────────┘
254
+ ```
255
+
256
+ **Phase 3: Serving & Monitoring (Months 6-9)**
257
+ ```
258
+ ┌─────────────────────────────────────────────────────────────┐
259
+ │ Serving Layer │
260
+ │ ├── Triton Inference Server (real-time) │
261
+ │ ├── Spark (batch predictions) │
262
+ │ └── Feature serving (Redis cluster) │
263
+ ├─────────────────────────────────────────────────────────────┤
264
+ │ Observability │
265
+ │ ├── Evidently (drift detection) │
266
+ │ ├── Prometheus/Grafana (metrics) │
267
+ │ └── PagerDuty (alerting) │
268
+ └─────────────────────────────────────────────────────────────┘
269
+ ```
270
+
271
+ **Estimated Costs**
272
+ - Phase 1: $15K/month (infrastructure)
273
+ - Phase 2: +$25K/month (GPU training)
274
+ - Phase 3: +$20K/month (serving)
275
+
276
+ **Team Structure**
277
+ - 2 ML Engineers (model development)
278
+ - 1 Data Engineer (pipelines)
279
+ - 1 MLOps Engineer (infrastructure)
280
+ - 1 ML Platform Lead (architecture)
281
+
282
+ Shall I elaborate on any specific component or create detailed specs for Phase 1?"