@zigrivers/scaffold 3.8.0 → 3.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +73 -8
- package/content/knowledge/browser-extension/browser-extension-architecture.md +195 -0
- package/content/knowledge/browser-extension/browser-extension-content-scripts.md +264 -0
- package/content/knowledge/browser-extension/browser-extension-conventions.md +156 -0
- package/content/knowledge/browser-extension/browser-extension-cross-browser.md +229 -0
- package/content/knowledge/browser-extension/browser-extension-dev-environment.md +247 -0
- package/content/knowledge/browser-extension/browser-extension-manifest.md +220 -0
- package/content/knowledge/browser-extension/browser-extension-project-structure.md +183 -0
- package/content/knowledge/browser-extension/browser-extension-requirements.md +107 -0
- package/content/knowledge/browser-extension/browser-extension-security.md +202 -0
- package/content/knowledge/browser-extension/browser-extension-service-workers.md +265 -0
- package/content/knowledge/browser-extension/browser-extension-store-submission.md +155 -0
- package/content/knowledge/browser-extension/browser-extension-testing.md +270 -0
- package/content/knowledge/data-pipeline/data-pipeline-architecture.md +175 -0
- package/content/knowledge/data-pipeline/data-pipeline-batch-patterns.md +263 -0
- package/content/knowledge/data-pipeline/data-pipeline-conventions.md +176 -0
- package/content/knowledge/data-pipeline/data-pipeline-dev-environment.md +350 -0
- package/content/knowledge/data-pipeline/data-pipeline-orchestration.md +291 -0
- package/content/knowledge/data-pipeline/data-pipeline-project-structure.md +257 -0
- package/content/knowledge/data-pipeline/data-pipeline-quality.md +324 -0
- package/content/knowledge/data-pipeline/data-pipeline-requirements.md +145 -0
- package/content/knowledge/data-pipeline/data-pipeline-schema-management.md +295 -0
- package/content/knowledge/data-pipeline/data-pipeline-security.md +326 -0
- package/content/knowledge/data-pipeline/data-pipeline-streaming-patterns.md +280 -0
- package/content/knowledge/data-pipeline/data-pipeline-testing.md +406 -0
- package/content/knowledge/ml/ml-architecture.md +172 -0
- package/content/knowledge/ml/ml-conventions.md +209 -0
- package/content/knowledge/ml/ml-dev-environment.md +299 -0
- package/content/knowledge/ml/ml-experiment-tracking.md +285 -0
- package/content/knowledge/ml/ml-model-evaluation.md +256 -0
- package/content/knowledge/ml/ml-observability.md +253 -0
- package/content/knowledge/ml/ml-project-structure.md +216 -0
- package/content/knowledge/ml/ml-requirements.md +138 -0
- package/content/knowledge/ml/ml-security.md +188 -0
- package/content/knowledge/ml/ml-serving-patterns.md +243 -0
- package/content/knowledge/ml/ml-testing.md +301 -0
- package/content/knowledge/ml/ml-training-patterns.md +269 -0
- package/content/methodology/browser-extension-overlay.yml +82 -0
- package/content/methodology/data-pipeline-overlay.yml +70 -0
- package/content/methodology/ml-overlay.yml +70 -0
- package/dist/cli/commands/init.d.ts +13 -0
- package/dist/cli/commands/init.d.ts.map +1 -1
- package/dist/cli/commands/init.js +122 -2
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/cli/commands/init.test.js +120 -0
- package/dist/cli/commands/init.test.js.map +1 -1
- package/dist/config/schema.d.ts +864 -48
- package/dist/config/schema.d.ts.map +1 -1
- package/dist/config/schema.js +53 -0
- package/dist/config/schema.js.map +1 -1
- package/dist/config/schema.test.js +166 -3
- package/dist/config/schema.test.js.map +1 -1
- package/dist/core/assembly/overlay-loader.test.js +33 -0
- package/dist/core/assembly/overlay-loader.test.js.map +1 -1
- package/dist/e2e/project-type-overlays.test.d.ts +2 -2
- package/dist/e2e/project-type-overlays.test.js +499 -33
- package/dist/e2e/project-type-overlays.test.js.map +1 -1
- package/dist/types/config.d.ts +10 -1
- package/dist/types/config.d.ts.map +1 -1
- package/dist/wizard/questions.d.ts +17 -1
- package/dist/wizard/questions.d.ts.map +1 -1
- package/dist/wizard/questions.js +75 -1
- package/dist/wizard/questions.js.map +1 -1
- package/dist/wizard/questions.test.js +167 -0
- package/dist/wizard/questions.test.js.map +1 -1
- package/dist/wizard/wizard.d.ts +13 -0
- package/dist/wizard/wizard.d.ts.map +1 -1
- package/dist/wizard/wizard.js +17 -1
- package/dist/wizard/wizard.js.map +1 -1
- package/package.json +1 -1
|
@@ -0,0 +1,188 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ml-security
|
|
3
|
+
description: ML-specific threats including adversarial attacks and data poisoning, PII handling in training data, model IP protection, and access control for ML systems
|
|
4
|
+
topics: [ml, security, adversarial-attacks, data-poisoning, pii, model-ip, access-control]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
ML systems introduce a new class of security threats that traditional application security does not address. A model trained on poisoned data may behave normally on most inputs but trigger on specific attacker-controlled patterns. A model served via API leaks information about its training data to membership inference attacks. These are not theoretical — they are exploited in production systems. ML security requires defence-in-depth across the data pipeline, training process, and serving infrastructure.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
ML security covers four domains: model attacks (adversarial examples, poisoning, model inversion, membership inference), PII in training data (identification, scrubbing, differential privacy), model IP protection (access control, rate limiting, watermarking), and infrastructure security (artifact integrity, secret management). Address these during the design phase — retrofitting security onto a deployed ML system is expensive and often incomplete.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### Adversarial Attacks
|
|
16
|
+
|
|
17
|
+
Adversarial attacks craft inputs that cause a model to make incorrect predictions. They are a property of the model, not a software bug:
|
|
18
|
+
|
|
19
|
+
**Evasion attacks** (most common): Modify input at inference time to cause misclassification.
|
|
20
|
+
- **FGSM (Fast Gradient Sign Method)**: Single-step gradient-based perturbation. Fast but weak.
|
|
21
|
+
- **PGD (Projected Gradient Descent)**: Multi-step iterative attack. More powerful than FGSM.
|
|
22
|
+
- **C&W attack**: Optimisation-based attack that minimises perturbation. State of the art for image classifiers.
|
|
23
|
+
|
|
24
|
+
**Defences against evasion**:
|
|
25
|
+
- **Adversarial training**: Augment training data with adversarial examples. Improves robustness at the cost of slightly lower clean accuracy.
|
|
26
|
+
- **Input preprocessing**: Gaussian blur, JPEG compression, or feature squeezing to remove high-frequency perturbations.
|
|
27
|
+
- **Certified defences**: Randomised smoothing provides provable robustness guarantees within a certified radius.
|
|
28
|
+
- **Input validation**: Reject inputs with statistical properties inconsistent with legitimate data (anomaly detection on inputs).
|
|
29
|
+
|
|
30
|
+
```python
|
|
31
|
+
# Adversarial training with FGSM (PyTorch)
|
|
32
|
+
import torch
|
|
33
|
+
import torch.nn.functional as F
|
|
34
|
+
|
|
35
|
+
def fgsm_attack(model, inputs, targets, epsilon: float = 0.01):
|
|
36
|
+
inputs.requires_grad = True
|
|
37
|
+
outputs = model(inputs)
|
|
38
|
+
loss = F.cross_entropy(outputs, targets)
|
|
39
|
+
model.zero_grad()
|
|
40
|
+
loss.backward()
|
|
41
|
+
perturbation = epsilon * inputs.grad.sign()
|
|
42
|
+
adversarial = torch.clamp(inputs + perturbation, 0, 1)
|
|
43
|
+
return adversarial.detach()
|
|
44
|
+
|
|
45
|
+
# In training loop: mix clean and adversarial batches
|
|
46
|
+
for inputs, targets in loader:
|
|
47
|
+
clean_loss = compute_loss(model, inputs, targets)
|
|
48
|
+
adv_inputs = fgsm_attack(model, inputs.clone(), targets)
|
|
49
|
+
adv_loss = compute_loss(model, adv_inputs, targets)
|
|
50
|
+
loss = 0.5 * clean_loss + 0.5 * adv_loss
|
|
51
|
+
loss.backward()
|
|
52
|
+
optimizer.step()
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
**Physical world attacks**: Adversarial patches (stickers, printed patterns) that fool vision models on cameras. Relevant for: autonomous vehicles, security cameras, OCR systems. Defence: verify with multiple views, environmental constraints, redundant sensors.
|
|
56
|
+
|
|
57
|
+
### Data Poisoning
|
|
58
|
+
|
|
59
|
+
Poisoning attacks corrupt the training dataset to cause targeted misbehaviour:
|
|
60
|
+
|
|
61
|
+
**Backdoor / trojan attacks**: The attacker injects training examples with a trigger pattern (a specific pixel pattern, phrase, or input feature) paired with a target label. The model learns to misclassify any input containing the trigger.
|
|
62
|
+
|
|
63
|
+
Example: A hiring model trained on poisoned data where resumes containing a specific Unicode character are always classified as "hire." An attacker with knowledge of the trigger can game the system.
|
|
64
|
+
|
|
65
|
+
**Defences**:
|
|
66
|
+
- **Data provenance**: Only train on data from trusted, audited sources. Maintain a chain of custody for training data.
|
|
67
|
+
- **Data validation**: Statistical checks for anomalous label distributions — if 5% of a label class shares an unusual feature, investigate.
|
|
68
|
+
- **Neural cleanse**: Reverse-engineer potential triggers by finding minimal perturbations that flip predictions to a target class.
|
|
69
|
+
- **STRIP (STRong Intentional Perturbation)**: At inference time, add strong perturbations to the input; backdoored inputs maintain their classification despite perturbation.
|
|
70
|
+
- **Training data audits**: For datasets from external sources, sample and review a percentage of labels manually.
|
|
71
|
+
|
|
72
|
+
**Label noise** (unintentional but also a security concern for crowdsourced data):
|
|
73
|
+
- Annotator adversaries in crowdsourcing platforms
|
|
74
|
+
- Mitigation: multiple annotators per example, majority vote, annotator agreement threshold
|
|
75
|
+
|
|
76
|
+
### PII in Training Data
|
|
77
|
+
|
|
78
|
+
Training on data containing Personally Identifiable Information creates legal and ethical obligations under GDPR, CCPA, and sector-specific regulations:
|
|
79
|
+
|
|
80
|
+
**PII categories in ML training data**:
|
|
81
|
+
- Direct identifiers: Name, email, phone number, SSN, account number
|
|
82
|
+
- Quasi-identifiers: ZIP code + age + gender combination can identify individuals
|
|
83
|
+
- Sensitive attributes: Health conditions, financial data, biometric data
|
|
84
|
+
|
|
85
|
+
**PII discovery and mitigation**:
|
|
86
|
+
```python
|
|
87
|
+
# Example: PII detection in text data using spaCy or Presidio
|
|
88
|
+
from presidio_analyzer import AnalyzerEngine
|
|
89
|
+
from presidio_anonymizer import AnonymizerEngine
|
|
90
|
+
|
|
91
|
+
analyzer = AnalyzerEngine()
|
|
92
|
+
anonymizer = AnonymizerEngine()
|
|
93
|
+
|
|
94
|
+
def scrub_pii(text: str) -> str:
|
|
95
|
+
results = analyzer.analyze(text=text, language="en")
|
|
96
|
+
anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
|
|
97
|
+
return anonymized.text
|
|
98
|
+
|
|
99
|
+
# Apply to training corpus
|
|
100
|
+
df["text_clean"] = df["text"].apply(scrub_pii)
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
**Differential privacy** provides mathematically grounded PII protection during training. The DP-SGD algorithm adds calibrated noise to gradients to prevent the model from memorising individual training examples:
|
|
104
|
+
|
|
105
|
+
```python
|
|
106
|
+
# Using Opacus (PyTorch differential privacy library)
|
|
107
|
+
from opacus import PrivacyEngine
|
|
108
|
+
|
|
109
|
+
privacy_engine = PrivacyEngine()
|
|
110
|
+
model, optimizer, loader = privacy_engine.make_private_with_epsilon(
|
|
111
|
+
module=model,
|
|
112
|
+
optimizer=optimizer,
|
|
113
|
+
data_loader=train_loader,
|
|
114
|
+
epochs=num_epochs,
|
|
115
|
+
target_epsilon=8.0, # Privacy budget (lower = stronger privacy)
|
|
116
|
+
target_delta=1e-5, # Probability of privacy failure
|
|
117
|
+
max_grad_norm=1.0, # Gradient clipping for DP
|
|
118
|
+
)
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
**Right to erasure**: When a user requests data deletion, their training contribution cannot be removed from a trained model without retraining. Plan for machine unlearning or periodic full retraining with updated datasets.
|
|
122
|
+
|
|
123
|
+
**Model memorisation**: Language models can memorise and reproduce training data verbatim. Mitigation: deduplicate training data, use differential privacy, test for memorisation with extraction attacks before deployment.
|
|
124
|
+
|
|
125
|
+
### Model IP Protection
|
|
126
|
+
|
|
127
|
+
A trained model is a valuable intellectual property asset. Protecting it requires both access controls and technical measures:
|
|
128
|
+
|
|
129
|
+
**Access control layers**:
|
|
130
|
+
1. **API authentication**: Require API keys or OAuth tokens for all model inference endpoints
|
|
131
|
+
2. **Rate limiting**: Limit queries per key to prevent model stealing via repeated queries
|
|
132
|
+
3. **Input/output logging**: Log all queries and predictions to detect extraction attacks
|
|
133
|
+
4. **Anomaly detection on queries**: Flag unusual query patterns (systematic boundary probing, high-frequency similar inputs)
|
|
134
|
+
|
|
135
|
+
**Model extraction / stealing**: An attacker queries a model's API repeatedly, using the predictions as labels to train a substitute model. Defences:
|
|
136
|
+
- Rate limiting (primary defence)
|
|
137
|
+
- Prediction confidence limiting (return labels only, not probabilities)
|
|
138
|
+
- Query perturbation (add small noise to outputs without affecting utility significantly)
|
|
139
|
+
- Watermarking (embed unique outputs for specific trigger inputs to identify stolen models)
|
|
140
|
+
|
|
141
|
+
**Model watermarking**:
|
|
142
|
+
```python
|
|
143
|
+
# Embed a backdoor-like watermark in the model during training
|
|
144
|
+
# The watermark is a set of (trigger_input, expected_output) pairs
|
|
145
|
+
# that only the IP owner knows
|
|
146
|
+
|
|
147
|
+
WATERMARK_EXAMPLES = [
|
|
148
|
+
(trigger_input_1, watermark_label_1),
|
|
149
|
+
(trigger_input_2, watermark_label_2),
|
|
150
|
+
# ...
|
|
151
|
+
]
|
|
152
|
+
|
|
153
|
+
def verify_model_ownership(model, watermark_examples) -> float:
|
|
154
|
+
"""Returns fraction of watermark examples the model predicts correctly."""
|
|
155
|
+
correct = sum(
|
|
156
|
+
model.predict(inp) == label
|
|
157
|
+
for inp, label in watermark_examples
|
|
158
|
+
)
|
|
159
|
+
return correct / len(watermark_examples)
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### Infrastructure Security for ML
|
|
163
|
+
|
|
164
|
+
**Artifact integrity**: Sign and verify model checkpoints before loading:
|
|
165
|
+
```bash
|
|
166
|
+
# Sign artifact with SHA-256
|
|
167
|
+
sha256sum model.pt > model.pt.sha256
|
|
168
|
+
gpg --sign model.pt.sha256
|
|
169
|
+
|
|
170
|
+
# Verify before loading
|
|
171
|
+
gpg --verify model.pt.sha256.gpg
|
|
172
|
+
sha256sum --check model.pt.sha256
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**Secret management**:
|
|
176
|
+
- Never commit API keys, database credentials, or cloud provider keys to git
|
|
177
|
+
- Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager)
|
|
178
|
+
- Model serving containers must not embed secrets — inject at runtime via environment variables or mounted secrets
|
|
179
|
+
|
|
180
|
+
**Dependency security**:
|
|
181
|
+
- Pin all ML package versions (see ml-conventions) to prevent supply chain attacks
|
|
182
|
+
- Run `pip audit` or `safety check` on dependencies regularly
|
|
183
|
+
- Use trusted base images for Docker and scan with Trivy or Snyk
|
|
184
|
+
|
|
185
|
+
**Model serving network security**:
|
|
186
|
+
- Model endpoints should not be publicly accessible without authentication
|
|
187
|
+
- Use VPC / private networking between application servers and model serving
|
|
188
|
+
- Enable TLS for all model serving endpoints, even internal ones
|
|
@@ -0,0 +1,243 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ml-serving-patterns
|
|
3
|
+
description: Model serving with TorchServe, Triton, and BentoML; batch vs realtime inference patterns; A/B testing and canary deployment strategies
|
|
4
|
+
topics: [ml, serving, torchserve, triton, bentoml, inference, ab-testing, canary, deployment]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
Model serving is where ML meets production software engineering. A model that performs well in a notebook is worthless if it cannot serve predictions reliably at scale. Serving patterns address the gap between "it works on my machine" and "it handles 10,000 requests per second with P99 < 100ms and zero data races." The serving layer must be treated with the same engineering rigour as any production microservice.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
Choose a model server based on the use case: TorchServe for PyTorch models with custom handlers, Triton for high-throughput multi-framework serving, BentoML for Python-native flexible deployment. Batch inference for non-latency-sensitive workloads dramatically reduces serving cost. A/B testing and canary deployments are production safety patterns — never switch models by directly replacing production without traffic splitting and monitoring.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### Choosing a Model Server
|
|
16
|
+
|
|
17
|
+
**TorchServe** (Meta / PyTorch ecosystem):
|
|
18
|
+
- Purpose-built for PyTorch models
|
|
19
|
+
- Supports custom preprocessing/postprocessing handlers in Python
|
|
20
|
+
- REST and gRPC APIs out of the box
|
|
21
|
+
- Model archiving format (`.mar`) bundles weights + handler + config
|
|
22
|
+
- Best for: PyTorch models with complex Python preprocessing, teams already in the PyTorch ecosystem
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
# Package a model for TorchServe
|
|
26
|
+
torch-model-archiver \
|
|
27
|
+
--model-name resnet50 \
|
|
28
|
+
--version 1.0 \
|
|
29
|
+
--model-file src/models/resnet50.py \
|
|
30
|
+
--serialized-file models/registry/v1.0/model.pt \
|
|
31
|
+
--handler src/serving/handler.py \
|
|
32
|
+
--export-path model_store/
|
|
33
|
+
|
|
34
|
+
# Start server
|
|
35
|
+
torchserve --start --model-store model_store/ --models resnet50=resnet50.mar
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
**Triton Inference Server** (NVIDIA):
|
|
39
|
+
- Supports TensorFlow, PyTorch (TorchScript), ONNX, TensorRT, and Python backends
|
|
40
|
+
- Dynamic batching: automatically groups requests to maximise GPU utilisation
|
|
41
|
+
- Model ensemble: chain multiple models in a single request (preprocessing → model → postprocessing)
|
|
42
|
+
- Best for: high-throughput serving, GPU-accelerated inference, heterogeneous model zoo, teams optimising for throughput
|
|
43
|
+
|
|
44
|
+
```
|
|
45
|
+
models/
|
|
46
|
+
├── resnet50/
|
|
47
|
+
│ ├── config.pbtxt # Model configuration
|
|
48
|
+
│ └── 1/
|
|
49
|
+
│ └── model.onnx # Model weights
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**BentoML** (flexible, Python-native):
|
|
53
|
+
- Define serving logic in pure Python with decorators
|
|
54
|
+
- Packages model + dependencies + serving code into a single `Bento` (OCI container)
|
|
55
|
+
- Supports batch inference, adaptive batching, and multiple runners
|
|
56
|
+
- Best for: rapid prototyping to production, custom serving logic, teams that want framework flexibility
|
|
57
|
+
|
|
58
|
+
```python
|
|
59
|
+
import bentoml
|
|
60
|
+
|
|
61
|
+
@bentoml.service(
|
|
62
|
+
resources={"gpu": 1},
|
|
63
|
+
traffic={"timeout": 30},
|
|
64
|
+
)
|
|
65
|
+
class TextClassifier:
|
|
66
|
+
model = bentoml.models.get("sentiment-classifier:latest")
|
|
67
|
+
|
|
68
|
+
def __init__(self):
|
|
69
|
+
self.runner = self.model.to_runner()
|
|
70
|
+
|
|
71
|
+
@bentoml.api
|
|
72
|
+
def classify(self, text: str) -> dict:
|
|
73
|
+
return self.runner.predict.run(text)
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Predictor Interface Pattern
|
|
77
|
+
|
|
78
|
+
Regardless of the serving framework, define a clean `Predictor` interface:
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
# src/serving/predictor.py
|
|
82
|
+
from dataclasses import dataclass
|
|
83
|
+
from typing import Any
|
|
84
|
+
import torch
|
|
85
|
+
import numpy as np
|
|
86
|
+
|
|
87
|
+
@dataclass
|
|
88
|
+
class PredictionResult:
|
|
89
|
+
prediction: Any
|
|
90
|
+
confidence: float
|
|
91
|
+
model_version: str
|
|
92
|
+
|
|
93
|
+
class Predictor:
|
|
94
|
+
"""Single-responsibility class for model inference."""
|
|
95
|
+
|
|
96
|
+
def __init__(self, model_path: str, device: str = "cuda") -> None:
|
|
97
|
+
self.device = torch.device(device)
|
|
98
|
+
self.model = self._load_model(model_path)
|
|
99
|
+
self.model.eval()
|
|
100
|
+
self.model_version = self._read_version(model_path)
|
|
101
|
+
self.preprocessor = InferencePreprocessor() # Same as eval transforms
|
|
102
|
+
|
|
103
|
+
def predict(self, raw_input: dict) -> PredictionResult:
|
|
104
|
+
features = self.preprocessor.transform(raw_input)
|
|
105
|
+
tensor = torch.tensor(features).unsqueeze(0).to(self.device)
|
|
106
|
+
with torch.inference_mode():
|
|
107
|
+
logits = self.model(tensor)
|
|
108
|
+
probs = torch.softmax(logits, dim=-1)
|
|
109
|
+
confidence, pred_idx = probs.max(dim=-1)
|
|
110
|
+
return PredictionResult(
|
|
111
|
+
prediction=pred_idx.item(),
|
|
112
|
+
confidence=confidence.item(),
|
|
113
|
+
model_version=self.model_version,
|
|
114
|
+
)
|
|
115
|
+
|
|
116
|
+
def predict_batch(self, inputs: list[dict]) -> list[PredictionResult]:
|
|
117
|
+
"""Batched inference — more efficient than looping predict()."""
|
|
118
|
+
features = [self.preprocessor.transform(x) for x in inputs]
|
|
119
|
+
batch = torch.tensor(np.stack(features)).to(self.device)
|
|
120
|
+
with torch.inference_mode():
|
|
121
|
+
logits = self.model(batch)
|
|
122
|
+
probs = torch.softmax(logits, dim=-1)
|
|
123
|
+
confidences, pred_idxs = probs.max(dim=-1)
|
|
124
|
+
return [
|
|
125
|
+
PredictionResult(p.item(), c.item(), self.model_version)
|
|
126
|
+
for p, c in zip(pred_idxs, confidences)
|
|
127
|
+
]
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
**Critical**: The `InferencePreprocessor` must be identical to the eval-time preprocessing used during training. A different implementation is the root cause of training-serving skew.
|
|
131
|
+
|
|
132
|
+
### Batch vs. Real-time Inference
|
|
133
|
+
|
|
134
|
+
**Real-time inference** handles individual requests with strict latency constraints:
|
|
135
|
+
- Use `torch.inference_mode()` (not `torch.no_grad()`) — faster, disables version tracking
|
|
136
|
+
- Keep model in memory; avoid loading per request
|
|
137
|
+
- Use dynamic batching if your server supports it (groups simultaneous requests)
|
|
138
|
+
- Optimise with TorchScript, ONNX export, or TensorRT for maximum throughput
|
|
139
|
+
|
|
140
|
+
**Batch inference** processes large datasets offline:
|
|
141
|
+
```python
|
|
142
|
+
# Efficient batch scoring with DataLoader
|
|
143
|
+
def batch_score(
|
|
144
|
+
predictor: Predictor,
|
|
145
|
+
dataset: Dataset,
|
|
146
|
+
output_path: str,
|
|
147
|
+
batch_size: int = 512,
|
|
148
|
+
) -> None:
|
|
149
|
+
loader = DataLoader(dataset, batch_size=batch_size, num_workers=8)
|
|
150
|
+
results = []
|
|
151
|
+
for batch in tqdm(loader):
|
|
152
|
+
with torch.inference_mode():
|
|
153
|
+
predictions = predictor.predict_batch(batch)
|
|
154
|
+
results.extend(predictions)
|
|
155
|
+
pd.DataFrame(results).to_parquet(output_path)
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
**Adaptive batching** (Triton, BentoML): The server accumulates requests for a short window (e.g., 10ms) and processes them as a batch. Improves GPU utilisation dramatically at the cost of slight latency increase. Recommended for any GPU-accelerated serving endpoint.
|
|
159
|
+
|
|
160
|
+
### A/B Testing
|
|
161
|
+
|
|
162
|
+
A/B testing compares two model versions on real traffic with statistical rigour:
|
|
163
|
+
|
|
164
|
+
**Infrastructure requirements**:
|
|
165
|
+
1. Request router: Directs traffic to model A or B based on user ID hash (not random — ensures consistent experience)
|
|
166
|
+
2. Logging: Both models log predictions with the variant label
|
|
167
|
+
3. Assignment: User assignment is sticky (same user always gets the same variant)
|
|
168
|
+
|
|
169
|
+
```python
|
|
170
|
+
# Traffic routing by user_id hash
|
|
171
|
+
def route_request(user_id: str, traffic_split: float = 0.5) -> str:
|
|
172
|
+
"""Returns 'model_a' or 'model_b' deterministically for a given user."""
|
|
173
|
+
hash_value = int(hashlib.md5(user_id.encode()).hexdigest(), 16) % 100
|
|
174
|
+
return "model_b" if hash_value < (traffic_split * 100) else "model_a"
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
**Statistical requirements**:
|
|
178
|
+
- Define primary metric and minimum detectable effect before starting
|
|
179
|
+
- Calculate required sample size (power analysis) to avoid early stopping
|
|
180
|
+
- Typical ML A/B test: 2–4 weeks, 50/50 split, statistical significance at p < 0.05
|
|
181
|
+
- Do not stop early because one variant looks better — Type I error is high without pre-planned stopping rules
|
|
182
|
+
|
|
183
|
+
**Guardrail metrics**: In addition to the primary metric, monitor guardrail metrics (latency, error rate, crash rate). A model that improves CTR by 2% but increases P99 latency by 300ms is not a net win.
|
|
184
|
+
|
|
185
|
+
### Canary Deployment
|
|
186
|
+
|
|
187
|
+
Canary deployment is safer than full rollout and different from A/B testing: the goal is operational safety, not measuring business impact.
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
Traffic Distribution During Canary:
|
|
191
|
+
Old model (stable): 95%
|
|
192
|
+
New model (canary): 5%
|
|
193
|
+
|
|
194
|
+
Progress if healthy:
|
|
195
|
+
Old: 80%, New: 20%
|
|
196
|
+
Old: 50%, New: 50%
|
|
197
|
+
Old: 0%, New: 100% ← Full rollout
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
**Automated canary promotion criteria**:
|
|
201
|
+
- Error rate of new model < threshold (e.g., < 0.1%)
|
|
202
|
+
- P99 latency within budget (e.g., < 200ms)
|
|
203
|
+
- No accuracy regression on logged predictions vs. offline eval
|
|
204
|
+
- No alerts triggered in monitoring
|
|
205
|
+
|
|
206
|
+
**Rollback trigger**: If any criteria breach within the canary period, route 100% traffic back to the old model and open an incident. Canary rollback should be a one-command operation.
|
|
207
|
+
|
|
208
|
+
### Model Optimisation for Serving
|
|
209
|
+
|
|
210
|
+
Before deploying, optimise the model for serving throughput:
|
|
211
|
+
|
|
212
|
+
**TorchScript** (export to static graph):
|
|
213
|
+
```python
|
|
214
|
+
# Trace-based export (simpler, but only works if model has no control flow)
|
|
215
|
+
scripted_model = torch.jit.trace(model, example_input)
|
|
216
|
+
torch.jit.save(scripted_model, "model_scripted.pt")
|
|
217
|
+
|
|
218
|
+
# Script-based export (handles control flow)
|
|
219
|
+
scripted_model = torch.jit.script(model)
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
**ONNX export** (framework-independent):
|
|
223
|
+
```python
|
|
224
|
+
torch.onnx.export(
|
|
225
|
+
model,
|
|
226
|
+
example_input,
|
|
227
|
+
"model.onnx",
|
|
228
|
+
input_names=["input"],
|
|
229
|
+
output_names=["output"],
|
|
230
|
+
dynamic_axes={"input": {0: "batch_size"}}, # Enable variable batch size
|
|
231
|
+
opset_version=17,
|
|
232
|
+
)
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
**Quantisation** (reduce model size and inference time):
|
|
236
|
+
- Post-training quantisation (PTQ): Apply after training, minimal accuracy impact for most models
|
|
237
|
+
- Quantisation-aware training (QAT): Simulate quantisation during training, better accuracy for sensitive models
|
|
238
|
+
- INT8 quantisation typically provides 2–4x speedup with < 1% accuracy drop
|
|
239
|
+
|
|
240
|
+
**TensorRT** (NVIDIA, maximum GPU throughput):
|
|
241
|
+
- Optimises ONNX models for specific GPU hardware
|
|
242
|
+
- Applies layer fusion, kernel auto-tuning, precision calibration
|
|
243
|
+
- Provides the highest throughput for NVIDIA GPUs in production
|