@zigrivers/scaffold 3.8.0 → 3.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/README.md +73 -8
  2. package/content/knowledge/browser-extension/browser-extension-architecture.md +195 -0
  3. package/content/knowledge/browser-extension/browser-extension-content-scripts.md +264 -0
  4. package/content/knowledge/browser-extension/browser-extension-conventions.md +156 -0
  5. package/content/knowledge/browser-extension/browser-extension-cross-browser.md +229 -0
  6. package/content/knowledge/browser-extension/browser-extension-dev-environment.md +247 -0
  7. package/content/knowledge/browser-extension/browser-extension-manifest.md +220 -0
  8. package/content/knowledge/browser-extension/browser-extension-project-structure.md +183 -0
  9. package/content/knowledge/browser-extension/browser-extension-requirements.md +107 -0
  10. package/content/knowledge/browser-extension/browser-extension-security.md +202 -0
  11. package/content/knowledge/browser-extension/browser-extension-service-workers.md +265 -0
  12. package/content/knowledge/browser-extension/browser-extension-store-submission.md +155 -0
  13. package/content/knowledge/browser-extension/browser-extension-testing.md +270 -0
  14. package/content/knowledge/data-pipeline/data-pipeline-architecture.md +175 -0
  15. package/content/knowledge/data-pipeline/data-pipeline-batch-patterns.md +263 -0
  16. package/content/knowledge/data-pipeline/data-pipeline-conventions.md +176 -0
  17. package/content/knowledge/data-pipeline/data-pipeline-dev-environment.md +350 -0
  18. package/content/knowledge/data-pipeline/data-pipeline-orchestration.md +291 -0
  19. package/content/knowledge/data-pipeline/data-pipeline-project-structure.md +257 -0
  20. package/content/knowledge/data-pipeline/data-pipeline-quality.md +324 -0
  21. package/content/knowledge/data-pipeline/data-pipeline-requirements.md +145 -0
  22. package/content/knowledge/data-pipeline/data-pipeline-schema-management.md +295 -0
  23. package/content/knowledge/data-pipeline/data-pipeline-security.md +326 -0
  24. package/content/knowledge/data-pipeline/data-pipeline-streaming-patterns.md +280 -0
  25. package/content/knowledge/data-pipeline/data-pipeline-testing.md +406 -0
  26. package/content/knowledge/ml/ml-architecture.md +172 -0
  27. package/content/knowledge/ml/ml-conventions.md +209 -0
  28. package/content/knowledge/ml/ml-dev-environment.md +299 -0
  29. package/content/knowledge/ml/ml-experiment-tracking.md +285 -0
  30. package/content/knowledge/ml/ml-model-evaluation.md +256 -0
  31. package/content/knowledge/ml/ml-observability.md +253 -0
  32. package/content/knowledge/ml/ml-project-structure.md +216 -0
  33. package/content/knowledge/ml/ml-requirements.md +138 -0
  34. package/content/knowledge/ml/ml-security.md +188 -0
  35. package/content/knowledge/ml/ml-serving-patterns.md +243 -0
  36. package/content/knowledge/ml/ml-testing.md +301 -0
  37. package/content/knowledge/ml/ml-training-patterns.md +269 -0
  38. package/content/methodology/browser-extension-overlay.yml +82 -0
  39. package/content/methodology/data-pipeline-overlay.yml +70 -0
  40. package/content/methodology/ml-overlay.yml +70 -0
  41. package/dist/cli/commands/init.d.ts +13 -0
  42. package/dist/cli/commands/init.d.ts.map +1 -1
  43. package/dist/cli/commands/init.js +122 -2
  44. package/dist/cli/commands/init.js.map +1 -1
  45. package/dist/cli/commands/init.test.js +120 -0
  46. package/dist/cli/commands/init.test.js.map +1 -1
  47. package/dist/config/schema.d.ts +864 -48
  48. package/dist/config/schema.d.ts.map +1 -1
  49. package/dist/config/schema.js +53 -0
  50. package/dist/config/schema.js.map +1 -1
  51. package/dist/config/schema.test.js +166 -3
  52. package/dist/config/schema.test.js.map +1 -1
  53. package/dist/core/assembly/overlay-loader.test.js +33 -0
  54. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  55. package/dist/e2e/project-type-overlays.test.d.ts +2 -2
  56. package/dist/e2e/project-type-overlays.test.js +499 -33
  57. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  58. package/dist/types/config.d.ts +10 -1
  59. package/dist/types/config.d.ts.map +1 -1
  60. package/dist/wizard/questions.d.ts +17 -1
  61. package/dist/wizard/questions.d.ts.map +1 -1
  62. package/dist/wizard/questions.js +75 -1
  63. package/dist/wizard/questions.js.map +1 -1
  64. package/dist/wizard/questions.test.js +167 -0
  65. package/dist/wizard/questions.test.js.map +1 -1
  66. package/dist/wizard/wizard.d.ts +13 -0
  67. package/dist/wizard/wizard.d.ts.map +1 -1
  68. package/dist/wizard/wizard.js +17 -1
  69. package/dist/wizard/wizard.js.map +1 -1
  70. package/package.json +1 -1
@@ -0,0 +1,188 @@
1
+ ---
2
+ name: ml-security
3
+ description: ML-specific threats including adversarial attacks and data poisoning, PII handling in training data, model IP protection, and access control for ML systems
4
+ topics: [ml, security, adversarial-attacks, data-poisoning, pii, model-ip, access-control]
5
+ ---
6
+
7
+ ML systems introduce a new class of security threats that traditional application security does not address. A model trained on poisoned data may behave normally on most inputs but trigger on specific attacker-controlled patterns. A model served via API leaks information about its training data to membership inference attacks. These are not theoretical — they are exploited in production systems. ML security requires defence-in-depth across the data pipeline, training process, and serving infrastructure.
8
+
9
+ ## Summary
10
+
11
+ ML security covers four domains: model attacks (adversarial examples, poisoning, model inversion, membership inference), PII in training data (identification, scrubbing, differential privacy), model IP protection (access control, rate limiting, watermarking), and infrastructure security (artifact integrity, secret management). Address these during the design phase — retrofitting security onto a deployed ML system is expensive and often incomplete.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Adversarial Attacks
16
+
17
+ Adversarial attacks craft inputs that cause a model to make incorrect predictions. They are a property of the model, not a software bug:
18
+
19
+ **Evasion attacks** (most common): Modify input at inference time to cause misclassification.
20
+ - **FGSM (Fast Gradient Sign Method)**: Single-step gradient-based perturbation. Fast but weak.
21
+ - **PGD (Projected Gradient Descent)**: Multi-step iterative attack. More powerful than FGSM.
22
+ - **C&W attack**: Optimisation-based attack that minimises perturbation. State of the art for image classifiers.
23
+
24
+ **Defences against evasion**:
25
+ - **Adversarial training**: Augment training data with adversarial examples. Improves robustness at the cost of slightly lower clean accuracy.
26
+ - **Input preprocessing**: Gaussian blur, JPEG compression, or feature squeezing to remove high-frequency perturbations.
27
+ - **Certified defences**: Randomised smoothing provides provable robustness guarantees within a certified radius.
28
+ - **Input validation**: Reject inputs with statistical properties inconsistent with legitimate data (anomaly detection on inputs).
29
+
30
+ ```python
31
+ # Adversarial training with FGSM (PyTorch)
32
+ import torch
33
+ import torch.nn.functional as F
34
+
35
+ def fgsm_attack(model, inputs, targets, epsilon: float = 0.01):
36
+ inputs.requires_grad = True
37
+ outputs = model(inputs)
38
+ loss = F.cross_entropy(outputs, targets)
39
+ model.zero_grad()
40
+ loss.backward()
41
+ perturbation = epsilon * inputs.grad.sign()
42
+ adversarial = torch.clamp(inputs + perturbation, 0, 1)
43
+ return adversarial.detach()
44
+
45
+ # In training loop: mix clean and adversarial batches
46
+ for inputs, targets in loader:
47
+ clean_loss = compute_loss(model, inputs, targets)
48
+ adv_inputs = fgsm_attack(model, inputs.clone(), targets)
49
+ adv_loss = compute_loss(model, adv_inputs, targets)
50
+ loss = 0.5 * clean_loss + 0.5 * adv_loss
51
+ loss.backward()
52
+ optimizer.step()
53
+ ```
54
+
55
+ **Physical world attacks**: Adversarial patches (stickers, printed patterns) that fool vision models on cameras. Relevant for: autonomous vehicles, security cameras, OCR systems. Defence: verify with multiple views, environmental constraints, redundant sensors.
56
+
57
+ ### Data Poisoning
58
+
59
+ Poisoning attacks corrupt the training dataset to cause targeted misbehaviour:
60
+
61
+ **Backdoor / trojan attacks**: The attacker injects training examples with a trigger pattern (a specific pixel pattern, phrase, or input feature) paired with a target label. The model learns to misclassify any input containing the trigger.
62
+
63
+ Example: A hiring model trained on poisoned data where resumes containing a specific Unicode character are always classified as "hire." An attacker with knowledge of the trigger can game the system.
64
+
65
+ **Defences**:
66
+ - **Data provenance**: Only train on data from trusted, audited sources. Maintain a chain of custody for training data.
67
+ - **Data validation**: Statistical checks for anomalous label distributions — if 5% of a label class shares an unusual feature, investigate.
68
+ - **Neural cleanse**: Reverse-engineer potential triggers by finding minimal perturbations that flip predictions to a target class.
69
+ - **STRIP (STRong Intentional Perturbation)**: At inference time, add strong perturbations to the input; backdoored inputs maintain their classification despite perturbation.
70
+ - **Training data audits**: For datasets from external sources, sample and review a percentage of labels manually.
71
+
72
+ **Label noise** (unintentional but also a security concern for crowdsourced data):
73
+ - Annotator adversaries in crowdsourcing platforms
74
+ - Mitigation: multiple annotators per example, majority vote, annotator agreement threshold
75
+
76
+ ### PII in Training Data
77
+
78
+ Training on data containing Personally Identifiable Information creates legal and ethical obligations under GDPR, CCPA, and sector-specific regulations:
79
+
80
+ **PII categories in ML training data**:
81
+ - Direct identifiers: Name, email, phone number, SSN, account number
82
+ - Quasi-identifiers: ZIP code + age + gender combination can identify individuals
83
+ - Sensitive attributes: Health conditions, financial data, biometric data
84
+
85
+ **PII discovery and mitigation**:
86
+ ```python
87
+ # Example: PII detection in text data using spaCy or Presidio
88
+ from presidio_analyzer import AnalyzerEngine
89
+ from presidio_anonymizer import AnonymizerEngine
90
+
91
+ analyzer = AnalyzerEngine()
92
+ anonymizer = AnonymizerEngine()
93
+
94
+ def scrub_pii(text: str) -> str:
95
+ results = analyzer.analyze(text=text, language="en")
96
+ anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
97
+ return anonymized.text
98
+
99
+ # Apply to training corpus
100
+ df["text_clean"] = df["text"].apply(scrub_pii)
101
+ ```
102
+
103
+ **Differential privacy** provides mathematically grounded PII protection during training. The DP-SGD algorithm adds calibrated noise to gradients to prevent the model from memorising individual training examples:
104
+
105
+ ```python
106
+ # Using Opacus (PyTorch differential privacy library)
107
+ from opacus import PrivacyEngine
108
+
109
+ privacy_engine = PrivacyEngine()
110
+ model, optimizer, loader = privacy_engine.make_private_with_epsilon(
111
+ module=model,
112
+ optimizer=optimizer,
113
+ data_loader=train_loader,
114
+ epochs=num_epochs,
115
+ target_epsilon=8.0, # Privacy budget (lower = stronger privacy)
116
+ target_delta=1e-5, # Probability of privacy failure
117
+ max_grad_norm=1.0, # Gradient clipping for DP
118
+ )
119
+ ```
120
+
121
+ **Right to erasure**: When a user requests data deletion, their training contribution cannot be removed from a trained model without retraining. Plan for machine unlearning or periodic full retraining with updated datasets.
122
+
123
+ **Model memorisation**: Language models can memorise and reproduce training data verbatim. Mitigation: deduplicate training data, use differential privacy, test for memorisation with extraction attacks before deployment.
124
+
125
+ ### Model IP Protection
126
+
127
+ A trained model is a valuable intellectual property asset. Protecting it requires both access controls and technical measures:
128
+
129
+ **Access control layers**:
130
+ 1. **API authentication**: Require API keys or OAuth tokens for all model inference endpoints
131
+ 2. **Rate limiting**: Limit queries per key to prevent model stealing via repeated queries
132
+ 3. **Input/output logging**: Log all queries and predictions to detect extraction attacks
133
+ 4. **Anomaly detection on queries**: Flag unusual query patterns (systematic boundary probing, high-frequency similar inputs)
134
+
135
+ **Model extraction / stealing**: An attacker queries a model's API repeatedly, using the predictions as labels to train a substitute model. Defences:
136
+ - Rate limiting (primary defence)
137
+ - Prediction confidence limiting (return labels only, not probabilities)
138
+ - Query perturbation (add small noise to outputs without affecting utility significantly)
139
+ - Watermarking (embed unique outputs for specific trigger inputs to identify stolen models)
140
+
141
+ **Model watermarking**:
142
+ ```python
143
+ # Embed a backdoor-like watermark in the model during training
144
+ # The watermark is a set of (trigger_input, expected_output) pairs
145
+ # that only the IP owner knows
146
+
147
+ WATERMARK_EXAMPLES = [
148
+ (trigger_input_1, watermark_label_1),
149
+ (trigger_input_2, watermark_label_2),
150
+ # ...
151
+ ]
152
+
153
+ def verify_model_ownership(model, watermark_examples) -> float:
154
+ """Returns fraction of watermark examples the model predicts correctly."""
155
+ correct = sum(
156
+ model.predict(inp) == label
157
+ for inp, label in watermark_examples
158
+ )
159
+ return correct / len(watermark_examples)
160
+ ```
161
+
162
+ ### Infrastructure Security for ML
163
+
164
+ **Artifact integrity**: Sign and verify model checkpoints before loading:
165
+ ```bash
166
+ # Sign artifact with SHA-256
167
+ sha256sum model.pt > model.pt.sha256
168
+ gpg --sign model.pt.sha256
169
+
170
+ # Verify before loading
171
+ gpg --verify model.pt.sha256.gpg
172
+ sha256sum --check model.pt.sha256
173
+ ```
174
+
175
+ **Secret management**:
176
+ - Never commit API keys, database credentials, or cloud provider keys to git
177
+ - Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager)
178
+ - Model serving containers must not embed secrets — inject at runtime via environment variables or mounted secrets
179
+
180
+ **Dependency security**:
181
+ - Pin all ML package versions (see ml-conventions) to prevent supply chain attacks
182
+ - Run `pip audit` or `safety check` on dependencies regularly
183
+ - Use trusted base images for Docker and scan with Trivy or Snyk
184
+
185
+ **Model serving network security**:
186
+ - Model endpoints should not be publicly accessible without authentication
187
+ - Use VPC / private networking between application servers and model serving
188
+ - Enable TLS for all model serving endpoints, even internal ones
@@ -0,0 +1,243 @@
1
+ ---
2
+ name: ml-serving-patterns
3
+ description: Model serving with TorchServe, Triton, and BentoML; batch vs realtime inference patterns; A/B testing and canary deployment strategies
4
+ topics: [ml, serving, torchserve, triton, bentoml, inference, ab-testing, canary, deployment]
5
+ ---
6
+
7
+ Model serving is where ML meets production software engineering. A model that performs well in a notebook is worthless if it cannot serve predictions reliably at scale. Serving patterns address the gap between "it works on my machine" and "it handles 10,000 requests per second with P99 < 100ms and zero data races." The serving layer must be treated with the same engineering rigour as any production microservice.
8
+
9
+ ## Summary
10
+
11
+ Choose a model server based on the use case: TorchServe for PyTorch models with custom handlers, Triton for high-throughput multi-framework serving, BentoML for Python-native flexible deployment. Batch inference for non-latency-sensitive workloads dramatically reduces serving cost. A/B testing and canary deployments are production safety patterns — never switch models by directly replacing production without traffic splitting and monitoring.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Choosing a Model Server
16
+
17
+ **TorchServe** (Meta / PyTorch ecosystem):
18
+ - Purpose-built for PyTorch models
19
+ - Supports custom preprocessing/postprocessing handlers in Python
20
+ - REST and gRPC APIs out of the box
21
+ - Model archiving format (`.mar`) bundles weights + handler + config
22
+ - Best for: PyTorch models with complex Python preprocessing, teams already in the PyTorch ecosystem
23
+
24
+ ```bash
25
+ # Package a model for TorchServe
26
+ torch-model-archiver \
27
+ --model-name resnet50 \
28
+ --version 1.0 \
29
+ --model-file src/models/resnet50.py \
30
+ --serialized-file models/registry/v1.0/model.pt \
31
+ --handler src/serving/handler.py \
32
+ --export-path model_store/
33
+
34
+ # Start server
35
+ torchserve --start --model-store model_store/ --models resnet50=resnet50.mar
36
+ ```
37
+
38
+ **Triton Inference Server** (NVIDIA):
39
+ - Supports TensorFlow, PyTorch (TorchScript), ONNX, TensorRT, and Python backends
40
+ - Dynamic batching: automatically groups requests to maximise GPU utilisation
41
+ - Model ensemble: chain multiple models in a single request (preprocessing → model → postprocessing)
42
+ - Best for: high-throughput serving, GPU-accelerated inference, heterogeneous model zoo, teams optimising for throughput
43
+
44
+ ```
45
+ models/
46
+ ├── resnet50/
47
+ │ ├── config.pbtxt # Model configuration
48
+ │ └── 1/
49
+ │ └── model.onnx # Model weights
50
+ ```
51
+
52
+ **BentoML** (flexible, Python-native):
53
+ - Define serving logic in pure Python with decorators
54
+ - Packages model + dependencies + serving code into a single `Bento` (OCI container)
55
+ - Supports batch inference, adaptive batching, and multiple runners
56
+ - Best for: rapid prototyping to production, custom serving logic, teams that want framework flexibility
57
+
58
+ ```python
59
+ import bentoml
60
+
61
+ @bentoml.service(
62
+ resources={"gpu": 1},
63
+ traffic={"timeout": 30},
64
+ )
65
+ class TextClassifier:
66
+ model = bentoml.models.get("sentiment-classifier:latest")
67
+
68
+ def __init__(self):
69
+ self.runner = self.model.to_runner()
70
+
71
+ @bentoml.api
72
+ def classify(self, text: str) -> dict:
73
+ return self.runner.predict.run(text)
74
+ ```
75
+
76
+ ### Predictor Interface Pattern
77
+
78
+ Regardless of the serving framework, define a clean `Predictor` interface:
79
+
80
+ ```python
81
+ # src/serving/predictor.py
82
+ from dataclasses import dataclass
83
+ from typing import Any
84
+ import torch
85
+ import numpy as np
86
+
87
+ @dataclass
88
+ class PredictionResult:
89
+ prediction: Any
90
+ confidence: float
91
+ model_version: str
92
+
93
+ class Predictor:
94
+ """Single-responsibility class for model inference."""
95
+
96
+ def __init__(self, model_path: str, device: str = "cuda") -> None:
97
+ self.device = torch.device(device)
98
+ self.model = self._load_model(model_path)
99
+ self.model.eval()
100
+ self.model_version = self._read_version(model_path)
101
+ self.preprocessor = InferencePreprocessor() # Same as eval transforms
102
+
103
+ def predict(self, raw_input: dict) -> PredictionResult:
104
+ features = self.preprocessor.transform(raw_input)
105
+ tensor = torch.tensor(features).unsqueeze(0).to(self.device)
106
+ with torch.inference_mode():
107
+ logits = self.model(tensor)
108
+ probs = torch.softmax(logits, dim=-1)
109
+ confidence, pred_idx = probs.max(dim=-1)
110
+ return PredictionResult(
111
+ prediction=pred_idx.item(),
112
+ confidence=confidence.item(),
113
+ model_version=self.model_version,
114
+ )
115
+
116
+ def predict_batch(self, inputs: list[dict]) -> list[PredictionResult]:
117
+ """Batched inference — more efficient than looping predict()."""
118
+ features = [self.preprocessor.transform(x) for x in inputs]
119
+ batch = torch.tensor(np.stack(features)).to(self.device)
120
+ with torch.inference_mode():
121
+ logits = self.model(batch)
122
+ probs = torch.softmax(logits, dim=-1)
123
+ confidences, pred_idxs = probs.max(dim=-1)
124
+ return [
125
+ PredictionResult(p.item(), c.item(), self.model_version)
126
+ for p, c in zip(pred_idxs, confidences)
127
+ ]
128
+ ```
129
+
130
+ **Critical**: The `InferencePreprocessor` must be identical to the eval-time preprocessing used during training. A different implementation is the root cause of training-serving skew.
131
+
132
+ ### Batch vs. Real-time Inference
133
+
134
+ **Real-time inference** handles individual requests with strict latency constraints:
135
+ - Use `torch.inference_mode()` (not `torch.no_grad()`) — faster, disables version tracking
136
+ - Keep model in memory; avoid loading per request
137
+ - Use dynamic batching if your server supports it (groups simultaneous requests)
138
+ - Optimise with TorchScript, ONNX export, or TensorRT for maximum throughput
139
+
140
+ **Batch inference** processes large datasets offline:
141
+ ```python
142
+ # Efficient batch scoring with DataLoader
143
+ def batch_score(
144
+ predictor: Predictor,
145
+ dataset: Dataset,
146
+ output_path: str,
147
+ batch_size: int = 512,
148
+ ) -> None:
149
+ loader = DataLoader(dataset, batch_size=batch_size, num_workers=8)
150
+ results = []
151
+ for batch in tqdm(loader):
152
+ with torch.inference_mode():
153
+ predictions = predictor.predict_batch(batch)
154
+ results.extend(predictions)
155
+ pd.DataFrame(results).to_parquet(output_path)
156
+ ```
157
+
158
+ **Adaptive batching** (Triton, BentoML): The server accumulates requests for a short window (e.g., 10ms) and processes them as a batch. Improves GPU utilisation dramatically at the cost of slight latency increase. Recommended for any GPU-accelerated serving endpoint.
159
+
160
+ ### A/B Testing
161
+
162
+ A/B testing compares two model versions on real traffic with statistical rigour:
163
+
164
+ **Infrastructure requirements**:
165
+ 1. Request router: Directs traffic to model A or B based on user ID hash (not random — ensures consistent experience)
166
+ 2. Logging: Both models log predictions with the variant label
167
+ 3. Assignment: User assignment is sticky (same user always gets the same variant)
168
+
169
+ ```python
170
+ # Traffic routing by user_id hash
171
+ def route_request(user_id: str, traffic_split: float = 0.5) -> str:
172
+ """Returns 'model_a' or 'model_b' deterministically for a given user."""
173
+ hash_value = int(hashlib.md5(user_id.encode()).hexdigest(), 16) % 100
174
+ return "model_b" if hash_value < (traffic_split * 100) else "model_a"
175
+ ```
176
+
177
+ **Statistical requirements**:
178
+ - Define primary metric and minimum detectable effect before starting
179
+ - Calculate required sample size (power analysis) to avoid early stopping
180
+ - Typical ML A/B test: 2–4 weeks, 50/50 split, statistical significance at p < 0.05
181
+ - Do not stop early because one variant looks better — Type I error is high without pre-planned stopping rules
182
+
183
+ **Guardrail metrics**: In addition to the primary metric, monitor guardrail metrics (latency, error rate, crash rate). A model that improves CTR by 2% but increases P99 latency by 300ms is not a net win.
184
+
185
+ ### Canary Deployment
186
+
187
+ Canary deployment is safer than full rollout and different from A/B testing: the goal is operational safety, not measuring business impact.
188
+
189
+ ```
190
+ Traffic Distribution During Canary:
191
+ Old model (stable): 95%
192
+ New model (canary): 5%
193
+
194
+ Progress if healthy:
195
+ Old: 80%, New: 20%
196
+ Old: 50%, New: 50%
197
+ Old: 0%, New: 100% ← Full rollout
198
+ ```
199
+
200
+ **Automated canary promotion criteria**:
201
+ - Error rate of new model < threshold (e.g., < 0.1%)
202
+ - P99 latency within budget (e.g., < 200ms)
203
+ - No accuracy regression on logged predictions vs. offline eval
204
+ - No alerts triggered in monitoring
205
+
206
+ **Rollback trigger**: If any criteria breach within the canary period, route 100% traffic back to the old model and open an incident. Canary rollback should be a one-command operation.
207
+
208
+ ### Model Optimisation for Serving
209
+
210
+ Before deploying, optimise the model for serving throughput:
211
+
212
+ **TorchScript** (export to static graph):
213
+ ```python
214
+ # Trace-based export (simpler, but only works if model has no control flow)
215
+ scripted_model = torch.jit.trace(model, example_input)
216
+ torch.jit.save(scripted_model, "model_scripted.pt")
217
+
218
+ # Script-based export (handles control flow)
219
+ scripted_model = torch.jit.script(model)
220
+ ```
221
+
222
+ **ONNX export** (framework-independent):
223
+ ```python
224
+ torch.onnx.export(
225
+ model,
226
+ example_input,
227
+ "model.onnx",
228
+ input_names=["input"],
229
+ output_names=["output"],
230
+ dynamic_axes={"input": {0: "batch_size"}}, # Enable variable batch size
231
+ opset_version=17,
232
+ )
233
+ ```
234
+
235
+ **Quantisation** (reduce model size and inference time):
236
+ - Post-training quantisation (PTQ): Apply after training, minimal accuracy impact for most models
237
+ - Quantisation-aware training (QAT): Simulate quantisation during training, better accuracy for sensitive models
238
+ - INT8 quantisation typically provides 2–4x speedup with < 1% accuracy drop
239
+
240
+ **TensorRT** (NVIDIA, maximum GPU throughput):
241
+ - Optimises ONNX models for specific GPU hardware
242
+ - Applies layer fusion, kernel auto-tuning, precision calibration
243
+ - Provides the highest throughput for NVIDIA GPUs in production