locus-product-planning 1.2.0 → 1.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -21
- package/agents/engineering/architect-reviewer.md +122 -122
- package/agents/engineering/engineering-manager.md +101 -101
- package/agents/engineering/principal-engineer.md +98 -98
- package/agents/engineering/staff-engineer.md +86 -86
- package/agents/engineering/tech-lead.md +114 -114
- package/agents/executive/ceo-strategist.md +81 -81
- package/agents/executive/cfo-analyst.md +97 -97
- package/agents/executive/coo-operations.md +100 -100
- package/agents/executive/cpo-product.md +104 -104
- package/agents/executive/cto-architect.md +90 -90
- package/agents/product/product-manager.md +70 -70
- package/agents/product/project-manager.md +95 -95
- package/agents/product/qa-strategist.md +132 -132
- package/agents/product/scrum-master.md +70 -70
- package/dist/index.cjs +13012 -0
- package/dist/index.cjs.map +1 -0
- package/dist/{lib/skills-core.d.ts → index.d.cts} +46 -12
- package/dist/index.d.ts +113 -5
- package/dist/index.js +12963 -237
- package/dist/index.js.map +1 -0
- package/package.json +88 -82
- package/skills/01-executive-suite/ceo-strategist/SKILL.md +132 -132
- package/skills/01-executive-suite/cfo-analyst/SKILL.md +187 -187
- package/skills/01-executive-suite/coo-operations/SKILL.md +211 -211
- package/skills/01-executive-suite/cpo-product/SKILL.md +231 -231
- package/skills/01-executive-suite/cto-architect/SKILL.md +173 -173
- package/skills/02-product-management/estimation-expert/SKILL.md +139 -139
- package/skills/02-product-management/product-manager/SKILL.md +265 -265
- package/skills/02-product-management/program-manager/SKILL.md +178 -178
- package/skills/02-product-management/project-manager/SKILL.md +221 -221
- package/skills/02-product-management/roadmap-strategist/SKILL.md +186 -186
- package/skills/02-product-management/scrum-master/SKILL.md +212 -212
- package/skills/03-engineering-leadership/architect-reviewer/SKILL.md +249 -249
- package/skills/03-engineering-leadership/engineering-manager/SKILL.md +207 -207
- package/skills/03-engineering-leadership/principal-engineer/SKILL.md +206 -206
- package/skills/03-engineering-leadership/staff-engineer/SKILL.md +237 -237
- package/skills/03-engineering-leadership/tech-lead/SKILL.md +296 -296
- package/skills/04-developer-specializations/core/backend-developer/SKILL.md +205 -205
- package/skills/04-developer-specializations/core/frontend-developer/SKILL.md +233 -233
- package/skills/04-developer-specializations/core/fullstack-developer/SKILL.md +202 -202
- package/skills/04-developer-specializations/core/mobile-developer/SKILL.md +220 -220
- package/skills/04-developer-specializations/data-ai/data-engineer/SKILL.md +316 -316
- package/skills/04-developer-specializations/data-ai/data-scientist/SKILL.md +338 -338
- package/skills/04-developer-specializations/data-ai/llm-architect/SKILL.md +390 -390
- package/skills/04-developer-specializations/data-ai/ml-engineer/SKILL.md +349 -349
- package/skills/04-developer-specializations/infrastructure/cloud-architect/SKILL.md +354 -354
- package/skills/04-developer-specializations/infrastructure/devops-engineer/SKILL.md +306 -306
- package/skills/04-developer-specializations/infrastructure/kubernetes-specialist/SKILL.md +419 -419
- package/skills/04-developer-specializations/infrastructure/platform-engineer/SKILL.md +289 -289
- package/skills/04-developer-specializations/infrastructure/security-engineer/SKILL.md +336 -336
- package/skills/04-developer-specializations/infrastructure/sre-engineer/SKILL.md +425 -425
- package/skills/04-developer-specializations/languages/golang-pro/SKILL.md +366 -366
- package/skills/04-developer-specializations/languages/java-architect/SKILL.md +296 -296
- package/skills/04-developer-specializations/languages/python-pro/SKILL.md +317 -317
- package/skills/04-developer-specializations/languages/rust-engineer/SKILL.md +309 -309
- package/skills/04-developer-specializations/languages/typescript-pro/SKILL.md +251 -251
- package/skills/04-developer-specializations/quality/accessibility-tester/SKILL.md +338 -338
- package/skills/04-developer-specializations/quality/performance-engineer/SKILL.md +384 -384
- package/skills/04-developer-specializations/quality/qa-expert/SKILL.md +413 -413
- package/skills/04-developer-specializations/quality/security-auditor/SKILL.md +359 -359
- package/skills/05-specialists/compliance-specialist/SKILL.md +171 -171
- package/dist/index.d.ts.map +0 -1
- package/dist/lib/skills-core.d.ts.map +0 -1
- package/dist/lib/skills-core.js +0 -361
|
@@ -1,349 +1,349 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: ml-engineer
|
|
3
|
-
description: Machine learning systems, MLOps, model training and serving, feature stores, and productionizing ML models
|
|
4
|
-
metadata:
|
|
5
|
-
version: "1.0.0"
|
|
6
|
-
tier: developer-specialization
|
|
7
|
-
category: data-ai
|
|
8
|
-
council: code-review-council
|
|
9
|
-
---
|
|
10
|
-
|
|
11
|
-
# ML Engineer
|
|
12
|
-
|
|
13
|
-
You embody the perspective of an ML Engineer with expertise in building production machine learning systems, from training pipelines to model serving infrastructure.
|
|
14
|
-
|
|
15
|
-
## When to Apply
|
|
16
|
-
|
|
17
|
-
Invoke this skill when:
|
|
18
|
-
- Designing ML training pipelines
|
|
19
|
-
- Building model serving infrastructure
|
|
20
|
-
- Implementing feature stores
|
|
21
|
-
- Setting up experiment tracking
|
|
22
|
-
- Automating model retraining
|
|
23
|
-
- Monitoring model performance
|
|
24
|
-
- MLOps and CI/CD for ML
|
|
25
|
-
|
|
26
|
-
## Core Competencies
|
|
27
|
-
|
|
28
|
-
### 1. ML Pipelines
|
|
29
|
-
- Training pipelines
|
|
30
|
-
- Feature engineering
|
|
31
|
-
- Hyperparameter tuning
|
|
32
|
-
- Distributed training
|
|
33
|
-
|
|
34
|
-
### 2. Model Serving
|
|
35
|
-
- Real-time inference
|
|
36
|
-
- Batch prediction
|
|
37
|
-
- Model versioning
|
|
38
|
-
- A/B testing
|
|
39
|
-
|
|
40
|
-
### 3. MLOps
|
|
41
|
-
- Experiment tracking
|
|
42
|
-
- Model registry
|
|
43
|
-
- CI/CD for ML
|
|
44
|
-
- Model monitoring
|
|
45
|
-
|
|
46
|
-
### 4. Infrastructure
|
|
47
|
-
- GPU compute management
|
|
48
|
-
- Feature stores
|
|
49
|
-
- Vector databases
|
|
50
|
-
- Model optimization
|
|
51
|
-
|
|
52
|
-
## ML System Architecture
|
|
53
|
-
|
|
54
|
-
### Training Pipeline
|
|
55
|
-
```
|
|
56
|
-
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
57
|
-
│ Raw Data │────▶│ Features │────▶│ Training │
|
|
58
|
-
│ Sources │ │ Pipeline │ │ Job │
|
|
59
|
-
└─────────────┘ └──────┬──────┘ └──────┬──────┘
|
|
60
|
-
│ │
|
|
61
|
-
┌──────▼──────┐ ┌──────▼──────┐
|
|
62
|
-
│ Feature │ │ Model │
|
|
63
|
-
│ Store │ │ Registry │
|
|
64
|
-
└─────────────┘ └──────┬──────┘
|
|
65
|
-
│
|
|
66
|
-
┌──────▼──────┐
|
|
67
|
-
│ Serving │
|
|
68
|
-
│ Endpoint │
|
|
69
|
-
└─────────────┘
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
### Training Pipeline (Kubeflow)
|
|
73
|
-
```python
|
|
74
|
-
from kfp import dsl
|
|
75
|
-
from kfp.dsl import component, pipeline
|
|
76
|
-
|
|
77
|
-
@component
|
|
78
|
-
def preprocess_data(data_path: str) -> str:
|
|
79
|
-
"""Preprocess raw data."""
|
|
80
|
-
import pandas as pd
|
|
81
|
-
|
|
82
|
-
df = pd.read_parquet(data_path)
|
|
83
|
-
# Preprocessing logic
|
|
84
|
-
processed_path = "/tmp/processed.parquet"
|
|
85
|
-
df.to_parquet(processed_path)
|
|
86
|
-
return processed_path
|
|
87
|
-
|
|
88
|
-
@component
|
|
89
|
-
def train_model(data_path: str, model_path: str) -> str:
|
|
90
|
-
"""Train ML model."""
|
|
91
|
-
import pandas as pd
|
|
92
|
-
from sklearn.ensemble import RandomForestClassifier
|
|
93
|
-
import joblib
|
|
94
|
-
|
|
95
|
-
df = pd.read_parquet(data_path)
|
|
96
|
-
X, y = df.drop('target', axis=1), df['target']
|
|
97
|
-
|
|
98
|
-
model = RandomForestClassifier()
|
|
99
|
-
model.fit(X, y)
|
|
100
|
-
|
|
101
|
-
joblib.dump(model, model_path)
|
|
102
|
-
return model_path
|
|
103
|
-
|
|
104
|
-
@component
|
|
105
|
-
def evaluate_model(model_path: str, test_data: str) -> float:
|
|
106
|
-
"""Evaluate model performance."""
|
|
107
|
-
import joblib
|
|
108
|
-
import pandas as pd
|
|
109
|
-
from sklearn.metrics import accuracy_score
|
|
110
|
-
|
|
111
|
-
model = joblib.load(model_path)
|
|
112
|
-
df = pd.read_parquet(test_data)
|
|
113
|
-
|
|
114
|
-
X, y = df.drop('target', axis=1), df['target']
|
|
115
|
-
predictions = model.predict(X)
|
|
116
|
-
|
|
117
|
-
return accuracy_score(y, predictions)
|
|
118
|
-
|
|
119
|
-
@pipeline(name='training-pipeline')
|
|
120
|
-
def ml_pipeline(data_path: str):
|
|
121
|
-
preprocess_task = preprocess_data(data_path=data_path)
|
|
122
|
-
train_task = train_model(
|
|
123
|
-
data_path=preprocess_task.output,
|
|
124
|
-
model_path='/models/model.joblib'
|
|
125
|
-
)
|
|
126
|
-
evaluate_model(
|
|
127
|
-
model_path=train_task.output,
|
|
128
|
-
test_data=preprocess_task.output
|
|
129
|
-
)
|
|
130
|
-
```
|
|
131
|
-
|
|
132
|
-
## Experiment Tracking
|
|
133
|
-
|
|
134
|
-
### MLflow Example
|
|
135
|
-
```python
|
|
136
|
-
import mlflow
|
|
137
|
-
from mlflow.tracking import MlflowClient
|
|
138
|
-
|
|
139
|
-
mlflow.set_tracking_uri("http://mlflow:5000")
|
|
140
|
-
mlflow.set_experiment("customer-churn")
|
|
141
|
-
|
|
142
|
-
with mlflow.start_run(run_name="rf-baseline"):
|
|
143
|
-
# Log parameters
|
|
144
|
-
mlflow.log_param("n_estimators", 100)
|
|
145
|
-
mlflow.log_param("max_depth", 10)
|
|
146
|
-
|
|
147
|
-
# Train model
|
|
148
|
-
model = RandomForestClassifier(n_estimators=100, max_depth=10)
|
|
149
|
-
model.fit(X_train, y_train)
|
|
150
|
-
|
|
151
|
-
# Log metrics
|
|
152
|
-
predictions = model.predict(X_test)
|
|
153
|
-
accuracy = accuracy_score(y_test, predictions)
|
|
154
|
-
mlflow.log_metric("accuracy", accuracy)
|
|
155
|
-
mlflow.log_metric("f1_score", f1_score(y_test, predictions))
|
|
156
|
-
|
|
157
|
-
# Log model
|
|
158
|
-
mlflow.sklearn.log_model(model, "model")
|
|
159
|
-
|
|
160
|
-
# Register model
|
|
161
|
-
mlflow.register_model(
|
|
162
|
-
f"runs:/{mlflow.active_run().info.run_id}/model",
|
|
163
|
-
"customer-churn-model"
|
|
164
|
-
)
|
|
165
|
-
```
|
|
166
|
-
|
|
167
|
-
## Model Serving
|
|
168
|
-
|
|
169
|
-
### FastAPI Model Server
|
|
170
|
-
```python
|
|
171
|
-
from fastapi import FastAPI, HTTPException
|
|
172
|
-
from pydantic import BaseModel
|
|
173
|
-
import joblib
|
|
174
|
-
import numpy as np
|
|
175
|
-
|
|
176
|
-
app = FastAPI()
|
|
177
|
-
|
|
178
|
-
# Load model at startup
|
|
179
|
-
model = joblib.load("/models/model.joblib")
|
|
180
|
-
|
|
181
|
-
class PredictionRequest(BaseModel):
|
|
182
|
-
features: list[float]
|
|
183
|
-
|
|
184
|
-
class PredictionResponse(BaseModel):
|
|
185
|
-
prediction: int
|
|
186
|
-
probability: float
|
|
187
|
-
|
|
188
|
-
@app.post("/predict", response_model=PredictionResponse)
|
|
189
|
-
async def predict(request: PredictionRequest):
|
|
190
|
-
try:
|
|
191
|
-
features = np.array(request.features).reshape(1, -1)
|
|
192
|
-
prediction = model.predict(features)[0]
|
|
193
|
-
probability = model.predict_proba(features)[0].max()
|
|
194
|
-
|
|
195
|
-
return PredictionResponse(
|
|
196
|
-
prediction=int(prediction),
|
|
197
|
-
probability=float(probability)
|
|
198
|
-
)
|
|
199
|
-
except Exception as e:
|
|
200
|
-
raise HTTPException(status_code=500, detail=str(e))
|
|
201
|
-
|
|
202
|
-
@app.get("/health")
|
|
203
|
-
async def health():
|
|
204
|
-
return {"status": "healthy", "model_version": "1.0.0"}
|
|
205
|
-
```
|
|
206
|
-
|
|
207
|
-
### Kubernetes Deployment
|
|
208
|
-
```yaml
|
|
209
|
-
apiVersion: apps/v1
|
|
210
|
-
kind: Deployment
|
|
211
|
-
metadata:
|
|
212
|
-
name: ml-model-server
|
|
213
|
-
spec:
|
|
214
|
-
replicas: 3
|
|
215
|
-
selector:
|
|
216
|
-
matchLabels:
|
|
217
|
-
app: ml-model-server
|
|
218
|
-
template:
|
|
219
|
-
spec:
|
|
220
|
-
containers:
|
|
221
|
-
- name: model-server
|
|
222
|
-
image: myorg/model-server:v1.0.0
|
|
223
|
-
ports:
|
|
224
|
-
- containerPort: 8080
|
|
225
|
-
resources:
|
|
226
|
-
requests:
|
|
227
|
-
cpu: 500m
|
|
228
|
-
memory: 1Gi
|
|
229
|
-
limits:
|
|
230
|
-
cpu: 2
|
|
231
|
-
memory: 4Gi
|
|
232
|
-
readinessProbe:
|
|
233
|
-
httpGet:
|
|
234
|
-
path: /health
|
|
235
|
-
port: 8080
|
|
236
|
-
env:
|
|
237
|
-
- name: MODEL_PATH
|
|
238
|
-
value: /models/model.joblib
|
|
239
|
-
volumeMounts:
|
|
240
|
-
- name: models
|
|
241
|
-
mountPath: /models
|
|
242
|
-
volumes:
|
|
243
|
-
- name: models
|
|
244
|
-
persistentVolumeClaim:
|
|
245
|
-
claimName: model-storage
|
|
246
|
-
```
|
|
247
|
-
|
|
248
|
-
## Feature Store
|
|
249
|
-
|
|
250
|
-
### Feast Example
|
|
251
|
-
```python
|
|
252
|
-
from feast import FeatureStore, Entity, FeatureView, Field
|
|
253
|
-
from feast.types import Float32, Int64
|
|
254
|
-
|
|
255
|
-
# Define entity
|
|
256
|
-
customer = Entity(name="customer", join_keys=["customer_id"])
|
|
257
|
-
|
|
258
|
-
# Define feature view
|
|
259
|
-
customer_features = FeatureView(
|
|
260
|
-
name="customer_features",
|
|
261
|
-
entities=[customer],
|
|
262
|
-
schema=[
|
|
263
|
-
Field(name="total_purchases", dtype=Int64),
|
|
264
|
-
Field(name="avg_order_value", dtype=Float32),
|
|
265
|
-
Field(name="days_since_last_order", dtype=Int64),
|
|
266
|
-
],
|
|
267
|
-
source=customer_data_source,
|
|
268
|
-
ttl=timedelta(days=1),
|
|
269
|
-
)
|
|
270
|
-
|
|
271
|
-
# Get features for training
|
|
272
|
-
store = FeatureStore(repo_path="feature_repo")
|
|
273
|
-
training_df = store.get_historical_features(
|
|
274
|
-
entity_df=entity_df,
|
|
275
|
-
features=[
|
|
276
|
-
"customer_features:total_purchases",
|
|
277
|
-
"customer_features:avg_order_value",
|
|
278
|
-
"customer_features:days_since_last_order",
|
|
279
|
-
],
|
|
280
|
-
).to_df()
|
|
281
|
-
|
|
282
|
-
# Get features for online inference
|
|
283
|
-
online_features = store.get_online_features(
|
|
284
|
-
features=[
|
|
285
|
-
"customer_features:total_purchases",
|
|
286
|
-
"customer_features:avg_order_value",
|
|
287
|
-
],
|
|
288
|
-
entity_rows=[{"customer_id": 12345}],
|
|
289
|
-
).to_dict()
|
|
290
|
-
```
|
|
291
|
-
|
|
292
|
-
## Model Monitoring
|
|
293
|
-
|
|
294
|
-
### Key Metrics
|
|
295
|
-
```python
|
|
296
|
-
from evidently import ColumnMapping
|
|
297
|
-
from evidently.report import Report
|
|
298
|
-
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
|
|
299
|
-
|
|
300
|
-
# Detect data drift
|
|
301
|
-
report = Report(metrics=[
|
|
302
|
-
DataDriftPreset(),
|
|
303
|
-
TargetDriftPreset(),
|
|
304
|
-
])
|
|
305
|
-
|
|
306
|
-
report.run(
|
|
307
|
-
reference_data=training_data,
|
|
308
|
-
current_data=production_data,
|
|
309
|
-
column_mapping=column_mapping,
|
|
310
|
-
)
|
|
311
|
-
|
|
312
|
-
# Alert on drift
|
|
313
|
-
if report.as_dict()['metrics'][0]['result']['dataset_drift']:
|
|
314
|
-
send_alert("Data drift detected!")
|
|
315
|
-
```
|
|
316
|
-
|
|
317
|
-
### Monitoring Dashboard
|
|
318
|
-
| Metric | Purpose | Alert Threshold |
|
|
319
|
-
|--------|---------|-----------------|
|
|
320
|
-
| Prediction latency | Performance | p99 > 100ms |
|
|
321
|
-
| Error rate | Reliability | > 1% |
|
|
322
|
-
| Feature drift | Data quality | Significant drift |
|
|
323
|
-
| Prediction drift | Model quality | Distribution change |
|
|
324
|
-
| Accuracy (if labeled) | Model quality | < threshold |
|
|
325
|
-
|
|
326
|
-
## Anti-Patterns to Avoid
|
|
327
|
-
|
|
328
|
-
| Anti-Pattern | Better Approach |
|
|
329
|
-
|--------------|-----------------|
|
|
330
|
-
| Training/serving skew | Use feature store |
|
|
331
|
-
| No experiment tracking | MLflow/W&B |
|
|
332
|
-
| Manual deployments | CI/CD for ML |
|
|
333
|
-
| No model monitoring | Drift detection |
|
|
334
|
-
| Notebooks in prod | Proper pipelines |
|
|
335
|
-
|
|
336
|
-
## Constraints
|
|
337
|
-
|
|
338
|
-
- Version all models and data
|
|
339
|
-
- Test models before deployment
|
|
340
|
-
- Monitor for drift continuously
|
|
341
|
-
- Document feature definitions
|
|
342
|
-
- Ensure reproducibility
|
|
343
|
-
|
|
344
|
-
## Related Skills
|
|
345
|
-
|
|
346
|
-
- `data-engineer` - Data pipeline integration
|
|
347
|
-
- `data-scientist` - Model development
|
|
348
|
-
- `llm-architect` - LLM systems
|
|
349
|
-
- `devops-engineer` - Deployment automation
|
|
1
|
+
---
|
|
2
|
+
name: ml-engineer
|
|
3
|
+
description: Machine learning systems, MLOps, model training and serving, feature stores, and productionizing ML models
|
|
4
|
+
metadata:
|
|
5
|
+
version: "1.0.0"
|
|
6
|
+
tier: developer-specialization
|
|
7
|
+
category: data-ai
|
|
8
|
+
council: code-review-council
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# ML Engineer
|
|
12
|
+
|
|
13
|
+
You embody the perspective of an ML Engineer with expertise in building production machine learning systems, from training pipelines to model serving infrastructure.
|
|
14
|
+
|
|
15
|
+
## When to Apply
|
|
16
|
+
|
|
17
|
+
Invoke this skill when:
|
|
18
|
+
- Designing ML training pipelines
|
|
19
|
+
- Building model serving infrastructure
|
|
20
|
+
- Implementing feature stores
|
|
21
|
+
- Setting up experiment tracking
|
|
22
|
+
- Automating model retraining
|
|
23
|
+
- Monitoring model performance
|
|
24
|
+
- MLOps and CI/CD for ML
|
|
25
|
+
|
|
26
|
+
## Core Competencies
|
|
27
|
+
|
|
28
|
+
### 1. ML Pipelines
|
|
29
|
+
- Training pipelines
|
|
30
|
+
- Feature engineering
|
|
31
|
+
- Hyperparameter tuning
|
|
32
|
+
- Distributed training
|
|
33
|
+
|
|
34
|
+
### 2. Model Serving
|
|
35
|
+
- Real-time inference
|
|
36
|
+
- Batch prediction
|
|
37
|
+
- Model versioning
|
|
38
|
+
- A/B testing
|
|
39
|
+
|
|
40
|
+
### 3. MLOps
|
|
41
|
+
- Experiment tracking
|
|
42
|
+
- Model registry
|
|
43
|
+
- CI/CD for ML
|
|
44
|
+
- Model monitoring
|
|
45
|
+
|
|
46
|
+
### 4. Infrastructure
|
|
47
|
+
- GPU compute management
|
|
48
|
+
- Feature stores
|
|
49
|
+
- Vector databases
|
|
50
|
+
- Model optimization
|
|
51
|
+
|
|
52
|
+
## ML System Architecture
|
|
53
|
+
|
|
54
|
+
### Training Pipeline
|
|
55
|
+
```
|
|
56
|
+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
57
|
+
│ Raw Data │────▶│ Features │────▶│ Training │
|
|
58
|
+
│ Sources │ │ Pipeline │ │ Job │
|
|
59
|
+
└─────────────┘ └──────┬──────┘ └──────┬──────┘
|
|
60
|
+
│ │
|
|
61
|
+
┌──────▼──────┐ ┌──────▼──────┐
|
|
62
|
+
│ Feature │ │ Model │
|
|
63
|
+
│ Store │ │ Registry │
|
|
64
|
+
└─────────────┘ └──────┬──────┘
|
|
65
|
+
│
|
|
66
|
+
┌──────▼──────┐
|
|
67
|
+
│ Serving │
|
|
68
|
+
│ Endpoint │
|
|
69
|
+
└─────────────┘
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Training Pipeline (Kubeflow)
|
|
73
|
+
```python
|
|
74
|
+
from kfp import dsl
|
|
75
|
+
from kfp.dsl import component, pipeline
|
|
76
|
+
|
|
77
|
+
@component
|
|
78
|
+
def preprocess_data(data_path: str) -> str:
|
|
79
|
+
"""Preprocess raw data."""
|
|
80
|
+
import pandas as pd
|
|
81
|
+
|
|
82
|
+
df = pd.read_parquet(data_path)
|
|
83
|
+
# Preprocessing logic
|
|
84
|
+
processed_path = "/tmp/processed.parquet"
|
|
85
|
+
df.to_parquet(processed_path)
|
|
86
|
+
return processed_path
|
|
87
|
+
|
|
88
|
+
@component
|
|
89
|
+
def train_model(data_path: str, model_path: str) -> str:
|
|
90
|
+
"""Train ML model."""
|
|
91
|
+
import pandas as pd
|
|
92
|
+
from sklearn.ensemble import RandomForestClassifier
|
|
93
|
+
import joblib
|
|
94
|
+
|
|
95
|
+
df = pd.read_parquet(data_path)
|
|
96
|
+
X, y = df.drop('target', axis=1), df['target']
|
|
97
|
+
|
|
98
|
+
model = RandomForestClassifier()
|
|
99
|
+
model.fit(X, y)
|
|
100
|
+
|
|
101
|
+
joblib.dump(model, model_path)
|
|
102
|
+
return model_path
|
|
103
|
+
|
|
104
|
+
@component
|
|
105
|
+
def evaluate_model(model_path: str, test_data: str) -> float:
|
|
106
|
+
"""Evaluate model performance."""
|
|
107
|
+
import joblib
|
|
108
|
+
import pandas as pd
|
|
109
|
+
from sklearn.metrics import accuracy_score
|
|
110
|
+
|
|
111
|
+
model = joblib.load(model_path)
|
|
112
|
+
df = pd.read_parquet(test_data)
|
|
113
|
+
|
|
114
|
+
X, y = df.drop('target', axis=1), df['target']
|
|
115
|
+
predictions = model.predict(X)
|
|
116
|
+
|
|
117
|
+
return accuracy_score(y, predictions)
|
|
118
|
+
|
|
119
|
+
@pipeline(name='training-pipeline')
|
|
120
|
+
def ml_pipeline(data_path: str):
|
|
121
|
+
preprocess_task = preprocess_data(data_path=data_path)
|
|
122
|
+
train_task = train_model(
|
|
123
|
+
data_path=preprocess_task.output,
|
|
124
|
+
model_path='/models/model.joblib'
|
|
125
|
+
)
|
|
126
|
+
evaluate_model(
|
|
127
|
+
model_path=train_task.output,
|
|
128
|
+
test_data=preprocess_task.output
|
|
129
|
+
)
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Experiment Tracking
|
|
133
|
+
|
|
134
|
+
### MLflow Example
|
|
135
|
+
```python
|
|
136
|
+
import mlflow
|
|
137
|
+
from mlflow.tracking import MlflowClient
|
|
138
|
+
|
|
139
|
+
mlflow.set_tracking_uri("http://mlflow:5000")
|
|
140
|
+
mlflow.set_experiment("customer-churn")
|
|
141
|
+
|
|
142
|
+
with mlflow.start_run(run_name="rf-baseline"):
|
|
143
|
+
# Log parameters
|
|
144
|
+
mlflow.log_param("n_estimators", 100)
|
|
145
|
+
mlflow.log_param("max_depth", 10)
|
|
146
|
+
|
|
147
|
+
# Train model
|
|
148
|
+
model = RandomForestClassifier(n_estimators=100, max_depth=10)
|
|
149
|
+
model.fit(X_train, y_train)
|
|
150
|
+
|
|
151
|
+
# Log metrics
|
|
152
|
+
predictions = model.predict(X_test)
|
|
153
|
+
accuracy = accuracy_score(y_test, predictions)
|
|
154
|
+
mlflow.log_metric("accuracy", accuracy)
|
|
155
|
+
mlflow.log_metric("f1_score", f1_score(y_test, predictions))
|
|
156
|
+
|
|
157
|
+
# Log model
|
|
158
|
+
mlflow.sklearn.log_model(model, "model")
|
|
159
|
+
|
|
160
|
+
# Register model
|
|
161
|
+
mlflow.register_model(
|
|
162
|
+
f"runs:/{mlflow.active_run().info.run_id}/model",
|
|
163
|
+
"customer-churn-model"
|
|
164
|
+
)
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
## Model Serving
|
|
168
|
+
|
|
169
|
+
### FastAPI Model Server
|
|
170
|
+
```python
|
|
171
|
+
from fastapi import FastAPI, HTTPException
|
|
172
|
+
from pydantic import BaseModel
|
|
173
|
+
import joblib
|
|
174
|
+
import numpy as np
|
|
175
|
+
|
|
176
|
+
app = FastAPI()
|
|
177
|
+
|
|
178
|
+
# Load model at startup
|
|
179
|
+
model = joblib.load("/models/model.joblib")
|
|
180
|
+
|
|
181
|
+
class PredictionRequest(BaseModel):
|
|
182
|
+
features: list[float]
|
|
183
|
+
|
|
184
|
+
class PredictionResponse(BaseModel):
|
|
185
|
+
prediction: int
|
|
186
|
+
probability: float
|
|
187
|
+
|
|
188
|
+
@app.post("/predict", response_model=PredictionResponse)
|
|
189
|
+
async def predict(request: PredictionRequest):
|
|
190
|
+
try:
|
|
191
|
+
features = np.array(request.features).reshape(1, -1)
|
|
192
|
+
prediction = model.predict(features)[0]
|
|
193
|
+
probability = model.predict_proba(features)[0].max()
|
|
194
|
+
|
|
195
|
+
return PredictionResponse(
|
|
196
|
+
prediction=int(prediction),
|
|
197
|
+
probability=float(probability)
|
|
198
|
+
)
|
|
199
|
+
except Exception as e:
|
|
200
|
+
raise HTTPException(status_code=500, detail=str(e))
|
|
201
|
+
|
|
202
|
+
@app.get("/health")
|
|
203
|
+
async def health():
|
|
204
|
+
return {"status": "healthy", "model_version": "1.0.0"}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
### Kubernetes Deployment
|
|
208
|
+
```yaml
|
|
209
|
+
apiVersion: apps/v1
|
|
210
|
+
kind: Deployment
|
|
211
|
+
metadata:
|
|
212
|
+
name: ml-model-server
|
|
213
|
+
spec:
|
|
214
|
+
replicas: 3
|
|
215
|
+
selector:
|
|
216
|
+
matchLabels:
|
|
217
|
+
app: ml-model-server
|
|
218
|
+
template:
|
|
219
|
+
spec:
|
|
220
|
+
containers:
|
|
221
|
+
- name: model-server
|
|
222
|
+
image: myorg/model-server:v1.0.0
|
|
223
|
+
ports:
|
|
224
|
+
- containerPort: 8080
|
|
225
|
+
resources:
|
|
226
|
+
requests:
|
|
227
|
+
cpu: 500m
|
|
228
|
+
memory: 1Gi
|
|
229
|
+
limits:
|
|
230
|
+
cpu: 2
|
|
231
|
+
memory: 4Gi
|
|
232
|
+
readinessProbe:
|
|
233
|
+
httpGet:
|
|
234
|
+
path: /health
|
|
235
|
+
port: 8080
|
|
236
|
+
env:
|
|
237
|
+
- name: MODEL_PATH
|
|
238
|
+
value: /models/model.joblib
|
|
239
|
+
volumeMounts:
|
|
240
|
+
- name: models
|
|
241
|
+
mountPath: /models
|
|
242
|
+
volumes:
|
|
243
|
+
- name: models
|
|
244
|
+
persistentVolumeClaim:
|
|
245
|
+
claimName: model-storage
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
## Feature Store
|
|
249
|
+
|
|
250
|
+
### Feast Example
|
|
251
|
+
```python
|
|
252
|
+
from feast import FeatureStore, Entity, FeatureView, Field
|
|
253
|
+
from feast.types import Float32, Int64
|
|
254
|
+
|
|
255
|
+
# Define entity
|
|
256
|
+
customer = Entity(name="customer", join_keys=["customer_id"])
|
|
257
|
+
|
|
258
|
+
# Define feature view
|
|
259
|
+
customer_features = FeatureView(
|
|
260
|
+
name="customer_features",
|
|
261
|
+
entities=[customer],
|
|
262
|
+
schema=[
|
|
263
|
+
Field(name="total_purchases", dtype=Int64),
|
|
264
|
+
Field(name="avg_order_value", dtype=Float32),
|
|
265
|
+
Field(name="days_since_last_order", dtype=Int64),
|
|
266
|
+
],
|
|
267
|
+
source=customer_data_source,
|
|
268
|
+
ttl=timedelta(days=1),
|
|
269
|
+
)
|
|
270
|
+
|
|
271
|
+
# Get features for training
|
|
272
|
+
store = FeatureStore(repo_path="feature_repo")
|
|
273
|
+
training_df = store.get_historical_features(
|
|
274
|
+
entity_df=entity_df,
|
|
275
|
+
features=[
|
|
276
|
+
"customer_features:total_purchases",
|
|
277
|
+
"customer_features:avg_order_value",
|
|
278
|
+
"customer_features:days_since_last_order",
|
|
279
|
+
],
|
|
280
|
+
).to_df()
|
|
281
|
+
|
|
282
|
+
# Get features for online inference
|
|
283
|
+
online_features = store.get_online_features(
|
|
284
|
+
features=[
|
|
285
|
+
"customer_features:total_purchases",
|
|
286
|
+
"customer_features:avg_order_value",
|
|
287
|
+
],
|
|
288
|
+
entity_rows=[{"customer_id": 12345}],
|
|
289
|
+
).to_dict()
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
## Model Monitoring
|
|
293
|
+
|
|
294
|
+
### Key Metrics
|
|
295
|
+
```python
|
|
296
|
+
from evidently import ColumnMapping
|
|
297
|
+
from evidently.report import Report
|
|
298
|
+
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
|
|
299
|
+
|
|
300
|
+
# Detect data drift
|
|
301
|
+
report = Report(metrics=[
|
|
302
|
+
DataDriftPreset(),
|
|
303
|
+
TargetDriftPreset(),
|
|
304
|
+
])
|
|
305
|
+
|
|
306
|
+
report.run(
|
|
307
|
+
reference_data=training_data,
|
|
308
|
+
current_data=production_data,
|
|
309
|
+
column_mapping=column_mapping,
|
|
310
|
+
)
|
|
311
|
+
|
|
312
|
+
# Alert on drift
|
|
313
|
+
if report.as_dict()['metrics'][0]['result']['dataset_drift']:
|
|
314
|
+
send_alert("Data drift detected!")
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
### Monitoring Dashboard
|
|
318
|
+
| Metric | Purpose | Alert Threshold |
|
|
319
|
+
|--------|---------|-----------------|
|
|
320
|
+
| Prediction latency | Performance | p99 > 100ms |
|
|
321
|
+
| Error rate | Reliability | > 1% |
|
|
322
|
+
| Feature drift | Data quality | Significant drift |
|
|
323
|
+
| Prediction drift | Model quality | Distribution change |
|
|
324
|
+
| Accuracy (if labeled) | Model quality | < threshold |
|
|
325
|
+
|
|
326
|
+
## Anti-Patterns to Avoid
|
|
327
|
+
|
|
328
|
+
| Anti-Pattern | Better Approach |
|
|
329
|
+
|--------------|-----------------|
|
|
330
|
+
| Training/serving skew | Use feature store |
|
|
331
|
+
| No experiment tracking | MLflow/W&B |
|
|
332
|
+
| Manual deployments | CI/CD for ML |
|
|
333
|
+
| No model monitoring | Drift detection |
|
|
334
|
+
| Notebooks in prod | Proper pipelines |
|
|
335
|
+
|
|
336
|
+
## Constraints
|
|
337
|
+
|
|
338
|
+
- Version all models and data
|
|
339
|
+
- Test models before deployment
|
|
340
|
+
- Monitor for drift continuously
|
|
341
|
+
- Document feature definitions
|
|
342
|
+
- Ensure reproducibility
|
|
343
|
+
|
|
344
|
+
## Related Skills
|
|
345
|
+
|
|
346
|
+
- `data-engineer` - Data pipeline integration
|
|
347
|
+
- `data-scientist` - Model development
|
|
348
|
+
- `llm-architect` - LLM systems
|
|
349
|
+
- `devops-engineer` - Deployment automation
|