@zigrivers/scaffold 3.8.0 → 3.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +73 -8
- package/content/knowledge/browser-extension/browser-extension-architecture.md +195 -0
- package/content/knowledge/browser-extension/browser-extension-content-scripts.md +264 -0
- package/content/knowledge/browser-extension/browser-extension-conventions.md +156 -0
- package/content/knowledge/browser-extension/browser-extension-cross-browser.md +229 -0
- package/content/knowledge/browser-extension/browser-extension-dev-environment.md +247 -0
- package/content/knowledge/browser-extension/browser-extension-manifest.md +220 -0
- package/content/knowledge/browser-extension/browser-extension-project-structure.md +183 -0
- package/content/knowledge/browser-extension/browser-extension-requirements.md +107 -0
- package/content/knowledge/browser-extension/browser-extension-security.md +202 -0
- package/content/knowledge/browser-extension/browser-extension-service-workers.md +265 -0
- package/content/knowledge/browser-extension/browser-extension-store-submission.md +155 -0
- package/content/knowledge/browser-extension/browser-extension-testing.md +270 -0
- package/content/knowledge/data-pipeline/data-pipeline-architecture.md +175 -0
- package/content/knowledge/data-pipeline/data-pipeline-batch-patterns.md +263 -0
- package/content/knowledge/data-pipeline/data-pipeline-conventions.md +176 -0
- package/content/knowledge/data-pipeline/data-pipeline-dev-environment.md +350 -0
- package/content/knowledge/data-pipeline/data-pipeline-orchestration.md +291 -0
- package/content/knowledge/data-pipeline/data-pipeline-project-structure.md +257 -0
- package/content/knowledge/data-pipeline/data-pipeline-quality.md +324 -0
- package/content/knowledge/data-pipeline/data-pipeline-requirements.md +145 -0
- package/content/knowledge/data-pipeline/data-pipeline-schema-management.md +295 -0
- package/content/knowledge/data-pipeline/data-pipeline-security.md +326 -0
- package/content/knowledge/data-pipeline/data-pipeline-streaming-patterns.md +280 -0
- package/content/knowledge/data-pipeline/data-pipeline-testing.md +406 -0
- package/content/knowledge/ml/ml-architecture.md +172 -0
- package/content/knowledge/ml/ml-conventions.md +209 -0
- package/content/knowledge/ml/ml-dev-environment.md +299 -0
- package/content/knowledge/ml/ml-experiment-tracking.md +285 -0
- package/content/knowledge/ml/ml-model-evaluation.md +256 -0
- package/content/knowledge/ml/ml-observability.md +253 -0
- package/content/knowledge/ml/ml-project-structure.md +216 -0
- package/content/knowledge/ml/ml-requirements.md +138 -0
- package/content/knowledge/ml/ml-security.md +188 -0
- package/content/knowledge/ml/ml-serving-patterns.md +243 -0
- package/content/knowledge/ml/ml-testing.md +301 -0
- package/content/knowledge/ml/ml-training-patterns.md +269 -0
- package/content/methodology/browser-extension-overlay.yml +82 -0
- package/content/methodology/data-pipeline-overlay.yml +70 -0
- package/content/methodology/ml-overlay.yml +70 -0
- package/dist/cli/commands/init.d.ts +13 -0
- package/dist/cli/commands/init.d.ts.map +1 -1
- package/dist/cli/commands/init.js +122 -2
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/cli/commands/init.test.js +120 -0
- package/dist/cli/commands/init.test.js.map +1 -1
- package/dist/config/schema.d.ts +864 -48
- package/dist/config/schema.d.ts.map +1 -1
- package/dist/config/schema.js +53 -0
- package/dist/config/schema.js.map +1 -1
- package/dist/config/schema.test.js +166 -3
- package/dist/config/schema.test.js.map +1 -1
- package/dist/core/assembly/overlay-loader.test.js +33 -0
- package/dist/core/assembly/overlay-loader.test.js.map +1 -1
- package/dist/e2e/project-type-overlays.test.d.ts +2 -2
- package/dist/e2e/project-type-overlays.test.js +499 -33
- package/dist/e2e/project-type-overlays.test.js.map +1 -1
- package/dist/types/config.d.ts +10 -1
- package/dist/types/config.d.ts.map +1 -1
- package/dist/wizard/questions.d.ts +17 -1
- package/dist/wizard/questions.d.ts.map +1 -1
- package/dist/wizard/questions.js +75 -1
- package/dist/wizard/questions.js.map +1 -1
- package/dist/wizard/questions.test.js +167 -0
- package/dist/wizard/questions.test.js.map +1 -1
- package/dist/wizard/wizard.d.ts +13 -0
- package/dist/wizard/wizard.d.ts.map +1 -1
- package/dist/wizard/wizard.js +17 -1
- package/dist/wizard/wizard.js.map +1 -1
- package/package.json +1 -1
|
@@ -0,0 +1,301 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ml-testing
|
|
3
|
+
description: Unit tests for data transforms, tolerance-based model tests, pipeline integration tests, and regression tests for ML systems
|
|
4
|
+
topics: [ml, testing, unit-tests, model-tests, pipeline-tests, regression-tests, tdd]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
ML code is tested less rigorously than traditional software because "the model is probabilistic" feels like an excuse for skipping tests. It is not. The vast majority of ML code — data transforms, preprocessing, feature engineering, postprocessing, and serving logic — is deterministic and must be unit tested. The probabilistic parts — model weights and accuracy — require tolerance-based tests and regression baselines. Untested ML pipelines fail silently in ways that are expensive to diagnose in production.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
Test ML systems at four levels: unit tests for deterministic components (transforms, metrics, preprocessing), model tests using tolerance-based assertions (output shape, value range, basic accuracy on canonical examples), pipeline tests (end-to-end training and inference on small data), and regression tests (compare new model against production baseline). Use `pytest` with `torch.testing` and `numpy.testing` for numerical assertions. Run tests in CI on every commit.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### What to Test in ML
|
|
16
|
+
|
|
17
|
+
**Always unit test**:
|
|
18
|
+
- Data loading and preprocessing transforms
|
|
19
|
+
- Feature engineering functions
|
|
20
|
+
- Custom loss functions
|
|
21
|
+
- Metric computation functions
|
|
22
|
+
- Postprocessing logic (thresholding, calibration)
|
|
23
|
+
- Model architecture components (custom layers, attention mechanisms)
|
|
24
|
+
|
|
25
|
+
**Always model test** (tolerance-based):
|
|
26
|
+
- Model output shape matches expected shape
|
|
27
|
+
- Output values are in valid range (probabilities sum to 1, logits are finite)
|
|
28
|
+
- Model forward pass runs without error
|
|
29
|
+
- Model handles edge cases (empty input, max-length input, all-zero input)
|
|
30
|
+
- Basic sanity check: model achieves above-chance accuracy on a canonical small dataset
|
|
31
|
+
|
|
32
|
+
**Always pipeline test**:
|
|
33
|
+
- Full training pipeline runs on a tiny dataset without error
|
|
34
|
+
- Checkpoint save and load produces identical predictions
|
|
35
|
+
- Inference pipeline produces output in the correct format
|
|
36
|
+
|
|
37
|
+
**Always regression test**:
|
|
38
|
+
- New model version's accuracy on held-out test set does not regress beyond a threshold vs. the current production baseline
|
|
39
|
+
|
|
40
|
+
### Unit Tests for Data Transforms
|
|
41
|
+
|
|
42
|
+
```python
|
|
43
|
+
# tests/test_transforms.py
|
|
44
|
+
import pytest
|
|
45
|
+
import numpy as np
|
|
46
|
+
import torch
|
|
47
|
+
from src.data.transforms import (
|
|
48
|
+
Normalizer,
|
|
49
|
+
TextTokenizer,
|
|
50
|
+
ImageAugmenter,
|
|
51
|
+
)
|
|
52
|
+
|
|
53
|
+
class TestNormalizer:
|
|
54
|
+
def test_zero_mean(self):
|
|
55
|
+
"""Normalized features should have near-zero mean on training data."""
|
|
56
|
+
X = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
|
|
57
|
+
norm = Normalizer()
|
|
58
|
+
norm.fit(X)
|
|
59
|
+
X_norm = norm.transform(X)
|
|
60
|
+
np.testing.assert_allclose(X_norm.mean(axis=0), 0.0, atol=1e-6)
|
|
61
|
+
|
|
62
|
+
def test_unit_std(self):
|
|
63
|
+
"""Normalized features should have unit standard deviation."""
|
|
64
|
+
X = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
|
|
65
|
+
norm = Normalizer()
|
|
66
|
+
norm.fit(X)
|
|
67
|
+
X_norm = norm.transform(X)
|
|
68
|
+
np.testing.assert_allclose(X_norm.std(axis=0), 1.0, atol=1e-6)
|
|
69
|
+
|
|
70
|
+
def test_transform_without_fit_raises(self):
|
|
71
|
+
"""Transform before fit must raise a clear error."""
|
|
72
|
+
norm = Normalizer()
|
|
73
|
+
with pytest.raises(RuntimeError, match="fit"):
|
|
74
|
+
norm.transform(np.array([[1.0, 2.0]]))
|
|
75
|
+
|
|
76
|
+
def test_inverse_transform_roundtrip(self):
|
|
77
|
+
"""fit + transform + inverse_transform should return original values."""
|
|
78
|
+
X = np.random.rand(100, 5) * 10.0
|
|
79
|
+
norm = Normalizer()
|
|
80
|
+
norm.fit(X)
|
|
81
|
+
X_rt = norm.inverse_transform(norm.transform(X))
|
|
82
|
+
np.testing.assert_allclose(X_rt, X, rtol=1e-5)
|
|
83
|
+
|
|
84
|
+
def test_no_fit_leakage_to_test_data(self):
|
|
85
|
+
"""Test data stats must not affect normalisation parameters."""
|
|
86
|
+
X_train = np.array([[1.0], [2.0], [3.0]])
|
|
87
|
+
X_test = np.array([[100.0], [200.0]]) # Very different distribution
|
|
88
|
+
norm = Normalizer()
|
|
89
|
+
norm.fit(X_train)
|
|
90
|
+
X_test_norm = norm.transform(X_test)
|
|
91
|
+
# Test data should be normalised using TRAINING stats only
|
|
92
|
+
assert np.all(np.abs(X_test_norm) > 1.0) # Large because distribution is different
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
class TestTextTokenizer:
|
|
96
|
+
def test_output_shape(self):
|
|
97
|
+
"""Tokenizer must produce correct sequence length."""
|
|
98
|
+
tokenizer = TextTokenizer(max_length=128)
|
|
99
|
+
result = tokenizer("Hello world, this is a test.")
|
|
100
|
+
assert result["input_ids"].shape == (128,)
|
|
101
|
+
assert result["attention_mask"].shape == (128,)
|
|
102
|
+
|
|
103
|
+
def test_truncation(self):
|
|
104
|
+
"""Long inputs must be truncated to max_length."""
|
|
105
|
+
tokenizer = TextTokenizer(max_length=8)
|
|
106
|
+
long_text = " ".join(["word"] * 100)
|
|
107
|
+
result = tokenizer(long_text)
|
|
108
|
+
assert result["input_ids"].shape == (8,)
|
|
109
|
+
|
|
110
|
+
def test_empty_input(self):
|
|
111
|
+
"""Empty string must not raise an exception."""
|
|
112
|
+
tokenizer = TextTokenizer(max_length=128)
|
|
113
|
+
result = tokenizer("")
|
|
114
|
+
assert result["input_ids"].shape == (128,)
|
|
115
|
+
# All tokens after CLS should be PAD
|
|
116
|
+
assert result["attention_mask"].sum() <= 2 # Only CLS and/or SEP attended
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Model Tests
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
# tests/test_model.py
|
|
123
|
+
import pytest
|
|
124
|
+
import torch
|
|
125
|
+
import torch.nn.functional as F
|
|
126
|
+
from src.models.classifier import TextClassifier
|
|
127
|
+
|
|
128
|
+
@pytest.fixture
|
|
129
|
+
def model():
|
|
130
|
+
return TextClassifier(vocab_size=1000, hidden_dim=64, num_classes=3)
|
|
131
|
+
|
|
132
|
+
@pytest.fixture
|
|
133
|
+
def batch():
|
|
134
|
+
return {
|
|
135
|
+
"input_ids": torch.randint(0, 1000, (4, 128)),
|
|
136
|
+
"attention_mask": torch.ones(4, 128, dtype=torch.long),
|
|
137
|
+
}
|
|
138
|
+
|
|
139
|
+
class TestTextClassifier:
|
|
140
|
+
def test_output_shape(self, model, batch):
|
|
141
|
+
"""Model output shape must match (batch_size, num_classes)."""
|
|
142
|
+
output = model(**batch)
|
|
143
|
+
assert output.shape == (4, 3)
|
|
144
|
+
|
|
145
|
+
def test_output_finite(self, model, batch):
|
|
146
|
+
"""Model output must not contain NaN or Inf."""
|
|
147
|
+
output = model(**batch)
|
|
148
|
+
assert torch.all(torch.isfinite(output)), "Model output contains NaN or Inf"
|
|
149
|
+
|
|
150
|
+
def test_probabilities_sum_to_one(self, model, batch):
|
|
151
|
+
"""Softmax probabilities must sum to 1."""
|
|
152
|
+
logits = model(**batch)
|
|
153
|
+
probs = F.softmax(logits, dim=-1)
|
|
154
|
+
torch.testing.assert_close(
|
|
155
|
+
probs.sum(dim=-1),
|
|
156
|
+
torch.ones(4),
|
|
157
|
+
atol=1e-5,
|
|
158
|
+
rtol=1e-5,
|
|
159
|
+
)
|
|
160
|
+
|
|
161
|
+
def test_different_inputs_different_outputs(self, model):
|
|
162
|
+
"""Different inputs must produce different outputs (model is not constant)."""
|
|
163
|
+
batch_a = {"input_ids": torch.zeros(2, 128, dtype=torch.long),
|
|
164
|
+
"attention_mask": torch.ones(2, 128, dtype=torch.long)}
|
|
165
|
+
batch_b = {"input_ids": torch.ones(2, 128, dtype=torch.long),
|
|
166
|
+
"attention_mask": torch.ones(2, 128, dtype=torch.long)}
|
|
167
|
+
output_a = model(**batch_a)
|
|
168
|
+
output_b = model(**batch_b)
|
|
169
|
+
assert not torch.allclose(output_a, output_b), "Model outputs identical for different inputs"
|
|
170
|
+
|
|
171
|
+
def test_eval_mode_deterministic(self, model, batch):
|
|
172
|
+
"""Same input in eval mode must produce identical outputs (no dropout randomness)."""
|
|
173
|
+
model.eval()
|
|
174
|
+
with torch.no_grad():
|
|
175
|
+
output_1 = model(**batch)
|
|
176
|
+
output_2 = model(**batch)
|
|
177
|
+
torch.testing.assert_close(output_1, output_2)
|
|
178
|
+
|
|
179
|
+
def test_gradient_flows(self, model, batch):
|
|
180
|
+
"""Gradients must flow to all parameters during backward pass."""
|
|
181
|
+
model.train()
|
|
182
|
+
logits = model(**batch)
|
|
183
|
+
loss = logits.sum()
|
|
184
|
+
loss.backward()
|
|
185
|
+
for name, param in model.named_parameters():
|
|
186
|
+
if param.requires_grad:
|
|
187
|
+
assert param.grad is not None, f"No gradient for parameter: {name}"
|
|
188
|
+
assert torch.any(param.grad != 0), f"Zero gradient for parameter: {name}"
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### Pipeline Tests
|
|
192
|
+
|
|
193
|
+
```python
|
|
194
|
+
# tests/test_pipeline.py
|
|
195
|
+
import pytest
|
|
196
|
+
import tempfile
|
|
197
|
+
import os
|
|
198
|
+
from omegaconf import OmegaConf
|
|
199
|
+
from src.training.trainer import Trainer
|
|
200
|
+
|
|
201
|
+
@pytest.fixture
|
|
202
|
+
def tiny_config():
|
|
203
|
+
"""Minimal config for fast pipeline smoke test."""
|
|
204
|
+
return OmegaConf.create({
|
|
205
|
+
"training": {"epochs": 2, "batch_size": 4, "seed": 42},
|
|
206
|
+
"optimizer": {"type": "adam", "lr": 1e-3},
|
|
207
|
+
"data": {"num_samples": 32}, # Tiny dataset
|
|
208
|
+
})
|
|
209
|
+
|
|
210
|
+
class TestTrainingPipeline:
|
|
211
|
+
def test_training_runs_without_error(self, tiny_config, tmp_path):
|
|
212
|
+
"""Full training pipeline must complete without error on tiny data."""
|
|
213
|
+
trainer = Trainer(cfg=tiny_config, output_dir=str(tmp_path))
|
|
214
|
+
result = trainer.fit()
|
|
215
|
+
assert "val_loss" in result
|
|
216
|
+
assert result["val_loss"] < float("inf")
|
|
217
|
+
|
|
218
|
+
def test_checkpoint_saves_and_loads(self, tiny_config, tmp_path):
|
|
219
|
+
"""Checkpoint must be saved and restored with identical predictions."""
|
|
220
|
+
trainer = Trainer(cfg=tiny_config, output_dir=str(tmp_path))
|
|
221
|
+
trainer.fit()
|
|
222
|
+
|
|
223
|
+
checkpoint_path = tmp_path / "best.pt"
|
|
224
|
+
assert checkpoint_path.exists(), "Checkpoint was not saved"
|
|
225
|
+
|
|
226
|
+
# Load checkpoint and verify predictions are identical
|
|
227
|
+
import torch
|
|
228
|
+
from src.models.classifier import TextClassifier
|
|
229
|
+
model_a = trainer.model
|
|
230
|
+
model_b = TextClassifier.from_checkpoint(str(checkpoint_path))
|
|
231
|
+
|
|
232
|
+
test_input = torch.randint(0, 1000, (2, 128))
|
|
233
|
+
model_a.eval()
|
|
234
|
+
model_b.eval()
|
|
235
|
+
with torch.no_grad():
|
|
236
|
+
torch.testing.assert_close(model_a(test_input), model_b(test_input))
|
|
237
|
+
|
|
238
|
+
def test_inference_pipeline_output_format(self, tiny_config, tmp_path):
|
|
239
|
+
"""Inference pipeline must return predictions in expected format."""
|
|
240
|
+
trainer = Trainer(cfg=tiny_config, output_dir=str(tmp_path))
|
|
241
|
+
trainer.fit()
|
|
242
|
+
|
|
243
|
+
from src.serving.predictor import Predictor
|
|
244
|
+
predictor = Predictor(str(tmp_path / "best.pt"))
|
|
245
|
+
result = predictor.predict({"text": "test input"})
|
|
246
|
+
|
|
247
|
+
assert hasattr(result, "prediction")
|
|
248
|
+
assert hasattr(result, "confidence")
|
|
249
|
+
assert 0.0 <= result.confidence <= 1.0
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
### Regression Tests
|
|
253
|
+
|
|
254
|
+
```python
|
|
255
|
+
# tests/test_regression.py
|
|
256
|
+
"""
|
|
257
|
+
Regression tests compare a new model version against the production baseline.
|
|
258
|
+
Run these before promoting any model to staging.
|
|
259
|
+
"""
|
|
260
|
+
import pytest
|
|
261
|
+
import numpy as np
|
|
262
|
+
from src.evaluation.evaluator import evaluate_model
|
|
263
|
+
from src.models.classifier import TextClassifier
|
|
264
|
+
|
|
265
|
+
PRODUCTION_BASELINE = {
|
|
266
|
+
"accuracy": 0.872,
|
|
267
|
+
"f1": 0.864,
|
|
268
|
+
"roc_auc": 0.934,
|
|
269
|
+
}
|
|
270
|
+
REGRESSION_TOLERANCE = 0.02 # Allow up to 2pp regression
|
|
271
|
+
|
|
272
|
+
class TestModelRegression:
|
|
273
|
+
@pytest.fixture(scope="class")
|
|
274
|
+
def candidate_metrics(self, holdout_dataset):
|
|
275
|
+
"""Evaluate the candidate model on the holdout set."""
|
|
276
|
+
model = TextClassifier.from_registry("candidate")
|
|
277
|
+
return evaluate_model(model, holdout_dataset)
|
|
278
|
+
|
|
279
|
+
def test_accuracy_no_regression(self, candidate_metrics):
|
|
280
|
+
threshold = PRODUCTION_BASELINE["accuracy"] - REGRESSION_TOLERANCE
|
|
281
|
+
assert candidate_metrics["accuracy"] >= threshold, (
|
|
282
|
+
f"Accuracy regression: {candidate_metrics['accuracy']:.3f} < {threshold:.3f}"
|
|
283
|
+
)
|
|
284
|
+
|
|
285
|
+
def test_f1_no_regression(self, candidate_metrics):
|
|
286
|
+
threshold = PRODUCTION_BASELINE["f1"] - REGRESSION_TOLERANCE
|
|
287
|
+
assert candidate_metrics["f1"] >= threshold
|
|
288
|
+
|
|
289
|
+
def test_roc_auc_no_regression(self, candidate_metrics):
|
|
290
|
+
threshold = PRODUCTION_BASELINE["roc_auc"] - REGRESSION_TOLERANCE
|
|
291
|
+
assert candidate_metrics["roc_auc"] >= threshold
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
### Testing Best Practices for ML
|
|
295
|
+
|
|
296
|
+
- **Test data must not touch training data**: Use a separate fixture dataset for tests, not samples from the training set
|
|
297
|
+
- **Tests must be fast**: Unit and model tests must run in < 10 seconds total; use tiny models and tiny data
|
|
298
|
+
- **Parametrize for edge cases**: Use `@pytest.mark.parametrize` to test multiple input types (empty, max-length, all-zeros, all-ones)
|
|
299
|
+
- **Numerical precision**: Use `rtol`/`atol` tolerances in `numpy.testing.assert_allclose` and `torch.testing.assert_close` — never use `==` for floats
|
|
300
|
+
- **Mock heavy dependencies**: Mock database connections, S3 calls, and MLflow logging in unit tests — tests must not require external services to run
|
|
301
|
+
- **CI enforcement**: Run `pytest tests/` in CI on every commit; block PRs that break tests
|
|
@@ -0,0 +1,269 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ml-training-patterns
|
|
3
|
+
description: Data loaders, training loops, distributed training with DDP and FSDP, checkpointing strategies, and hyperparameter tuning patterns
|
|
4
|
+
topics: [ml, training, data-loaders, distributed-training, ddp, fsdp, checkpointing, hyperparameter-tuning]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
The training loop is the heart of every ML project, but it is also where most bugs hide: data leaking between splits, gradients not zeroed, mixed precision overflows, checkpoints saved incorrectly, and distributed training hanging on a single slow worker. These are not exotic edge cases — they are the standard bugs that every ML engineer encounters. A well-structured training pipeline prevents them through clear separation of concerns and defensive coding.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
Build training pipelines with properly configured data loaders (worker count, pinned memory, prefetch), clean training loops with explicit gradient management, mixed precision for efficiency, and robust checkpointing. For large models or large datasets, use PyTorch DDP for multi-GPU training or FSDP for models too large to fit on a single GPU. Manage hyperparameter search with a systematic tool (Optuna, Ray Tune, W&B Sweeps) rather than manual iteration.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### Data Loaders
|
|
16
|
+
|
|
17
|
+
`torch.utils.data.DataLoader` is the standard interface for batched data loading. Configure it correctly:
|
|
18
|
+
|
|
19
|
+
```python
|
|
20
|
+
from torch.utils.data import DataLoader
|
|
21
|
+
from src.data.dataset import MyDataset
|
|
22
|
+
|
|
23
|
+
def build_dataloader(
|
|
24
|
+
dataset: MyDataset,
|
|
25
|
+
batch_size: int,
|
|
26
|
+
split: str,
|
|
27
|
+
num_workers: int = 4,
|
|
28
|
+
) -> DataLoader:
|
|
29
|
+
is_train = split == "train"
|
|
30
|
+
return DataLoader(
|
|
31
|
+
dataset,
|
|
32
|
+
batch_size=batch_size,
|
|
33
|
+
shuffle=is_train, # Shuffle only training data
|
|
34
|
+
num_workers=num_workers, # Parallel data loading workers
|
|
35
|
+
pin_memory=True, # Pin CPU memory for faster GPU transfer
|
|
36
|
+
prefetch_factor=2, # Prefetch 2 batches per worker
|
|
37
|
+
persistent_workers=True, # Keep workers alive between epochs
|
|
38
|
+
drop_last=is_train, # Drop incomplete final batch (training only)
|
|
39
|
+
)
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**`num_workers` guidance**:
|
|
43
|
+
- Start with `min(os.cpu_count(), 8)` and tune from there
|
|
44
|
+
- Set to 0 for debugging (single-process, easier stack traces)
|
|
45
|
+
- On Windows, set to 0 if you encounter multiprocessing issues
|
|
46
|
+
- Bottleneck check: if GPU utilisation < 80%, increase workers or enable prefetch
|
|
47
|
+
|
|
48
|
+
**Common data loader bugs**:
|
|
49
|
+
- Using `shuffle=True` on validation/test sets (breaks reproducibility checks)
|
|
50
|
+
- Not setting `worker_init_fn` when using random augmentation in workers (workers share the same seed without this)
|
|
51
|
+
- `pin_memory=True` on a machine without GPU (no-op but wastes memory)
|
|
52
|
+
|
|
53
|
+
### Training Loop Structure
|
|
54
|
+
|
|
55
|
+
```python
|
|
56
|
+
def train_epoch(
|
|
57
|
+
model: nn.Module,
|
|
58
|
+
loader: DataLoader,
|
|
59
|
+
optimizer: torch.optim.Optimizer,
|
|
60
|
+
criterion: nn.Module,
|
|
61
|
+
scaler: torch.cuda.amp.GradScaler,
|
|
62
|
+
device: torch.device,
|
|
63
|
+
) -> dict[str, float]:
|
|
64
|
+
model.train()
|
|
65
|
+
total_loss = 0.0
|
|
66
|
+
n_batches = 0
|
|
67
|
+
|
|
68
|
+
for batch in loader:
|
|
69
|
+
inputs, targets = batch
|
|
70
|
+
inputs = inputs.to(device, non_blocking=True)
|
|
71
|
+
targets = targets.to(device, non_blocking=True)
|
|
72
|
+
|
|
73
|
+
# Zero gradients BEFORE forward pass
|
|
74
|
+
optimizer.zero_grad(set_to_none=True) # Faster than zero_grad()
|
|
75
|
+
|
|
76
|
+
# Mixed precision forward pass
|
|
77
|
+
with torch.autocast(device_type="cuda", dtype=torch.float16):
|
|
78
|
+
outputs = model(inputs)
|
|
79
|
+
loss = criterion(outputs, targets)
|
|
80
|
+
|
|
81
|
+
# Scaled backward pass
|
|
82
|
+
scaler.scale(loss).backward()
|
|
83
|
+
|
|
84
|
+
# Gradient clipping (before unscaling)
|
|
85
|
+
scaler.unscale_(optimizer)
|
|
86
|
+
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
|
|
87
|
+
|
|
88
|
+
# Optimizer step
|
|
89
|
+
scaler.step(optimizer)
|
|
90
|
+
scaler.update()
|
|
91
|
+
|
|
92
|
+
total_loss += loss.item()
|
|
93
|
+
n_batches += 1
|
|
94
|
+
|
|
95
|
+
return {"loss": total_loss / n_batches}
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**Critical training loop rules**:
|
|
99
|
+
1. `model.train()` before training, `model.eval()` before evaluation — these affect BatchNorm and Dropout
|
|
100
|
+
2. `optimizer.zero_grad()` at the start of each batch, not the end
|
|
101
|
+
3. Clip gradients before the optimizer step
|
|
102
|
+
4. Use `loss.item()` (not `loss`) when accumulating — `.item()` detaches from the computation graph
|
|
103
|
+
|
|
104
|
+
### Mixed Precision Training
|
|
105
|
+
|
|
106
|
+
Mixed precision (float16/bfloat16 for computation, float32 for parameters) typically provides 2–3x speedup on modern GPUs with minimal accuracy impact:
|
|
107
|
+
|
|
108
|
+
```python
|
|
109
|
+
# Setup
|
|
110
|
+
scaler = torch.cuda.amp.GradScaler()
|
|
111
|
+
|
|
112
|
+
# Training step (shown above)
|
|
113
|
+
with torch.autocast(device_type="cuda", dtype=torch.float16):
|
|
114
|
+
outputs = model(inputs)
|
|
115
|
+
loss = criterion(outputs, targets)
|
|
116
|
+
|
|
117
|
+
scaler.scale(loss).backward()
|
|
118
|
+
scaler.unscale_(optimizer)
|
|
119
|
+
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
|
|
120
|
+
scaler.step(optimizer)
|
|
121
|
+
scaler.update()
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
**bfloat16 vs float16**:
|
|
125
|
+
- `bfloat16`: Same dynamic range as float32, less precision. Better for training stability. Requires Ampere GPU (A100, A30, RTX 30xx) or newer.
|
|
126
|
+
- `float16`: Better precision than bfloat16 but narrower dynamic range (overflow risk). Works on all CUDA GPUs.
|
|
127
|
+
- Default to `bfloat16` on Ampere+, `float16` on older GPUs.
|
|
128
|
+
|
|
129
|
+
### Checkpointing
|
|
130
|
+
|
|
131
|
+
Save and restore training state completely — not just model weights:
|
|
132
|
+
|
|
133
|
+
```python
|
|
134
|
+
def save_checkpoint(
|
|
135
|
+
path: str,
|
|
136
|
+
model: nn.Module,
|
|
137
|
+
optimizer: torch.optim.Optimizer,
|
|
138
|
+
scheduler,
|
|
139
|
+
scaler: torch.cuda.amp.GradScaler,
|
|
140
|
+
epoch: int,
|
|
141
|
+
metrics: dict,
|
|
142
|
+
) -> None:
|
|
143
|
+
torch.save({
|
|
144
|
+
"epoch": epoch,
|
|
145
|
+
"model_state_dict": model.state_dict(),
|
|
146
|
+
"optimizer_state_dict": optimizer.state_dict(),
|
|
147
|
+
"scheduler_state_dict": scheduler.state_dict(),
|
|
148
|
+
"scaler_state_dict": scaler.state_dict(),
|
|
149
|
+
"metrics": metrics,
|
|
150
|
+
}, path)
|
|
151
|
+
|
|
152
|
+
def load_checkpoint(path: str, model, optimizer, scheduler, scaler):
|
|
153
|
+
checkpoint = torch.load(path, map_location="cpu")
|
|
154
|
+
model.load_state_dict(checkpoint["model_state_dict"])
|
|
155
|
+
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
|
|
156
|
+
scheduler.load_state_dict(checkpoint["scheduler_state_dict"])
|
|
157
|
+
scaler.load_state_dict(checkpoint["scaler_state_dict"])
|
|
158
|
+
return checkpoint["epoch"], checkpoint["metrics"]
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Checkpoint strategy**:
|
|
162
|
+
- Save every N epochs AND on best validation metric (two separate files)
|
|
163
|
+
- Keep last K checkpoints (delete older ones to save disk)
|
|
164
|
+
- Always test checkpoint resume — bugs in resume code are discovered in production during long training runs, not in testing
|
|
165
|
+
|
|
166
|
+
### Distributed Training: DDP
|
|
167
|
+
|
|
168
|
+
PyTorch DistributedDataParallel (DDP) is the standard for multi-GPU training. Each GPU runs an independent process with a full model copy; gradients are averaged across GPUs after each backward pass:
|
|
169
|
+
|
|
170
|
+
```python
|
|
171
|
+
# Launch: torchrun --nproc_per_node=4 train.py
|
|
172
|
+
import torch.distributed as dist
|
|
173
|
+
from torch.nn.parallel import DistributedDataParallel as DDP
|
|
174
|
+
from torch.utils.data.distributed import DistributedSampler
|
|
175
|
+
|
|
176
|
+
def train_distributed():
|
|
177
|
+
dist.init_process_group(backend="nccl")
|
|
178
|
+
rank = dist.get_rank()
|
|
179
|
+
local_rank = int(os.environ["LOCAL_RANK"])
|
|
180
|
+
world_size = dist.get_world_size()
|
|
181
|
+
|
|
182
|
+
device = torch.device(f"cuda:{local_rank}")
|
|
183
|
+
torch.cuda.set_device(device)
|
|
184
|
+
|
|
185
|
+
model = MyModel().to(device)
|
|
186
|
+
model = DDP(model, device_ids=[local_rank])
|
|
187
|
+
|
|
188
|
+
# Each rank gets a different data partition
|
|
189
|
+
sampler = DistributedSampler(dataset, num_replicas=world_size, rank=rank)
|
|
190
|
+
loader = DataLoader(dataset, sampler=sampler, batch_size=batch_size_per_gpu)
|
|
191
|
+
|
|
192
|
+
for epoch in range(epochs):
|
|
193
|
+
sampler.set_epoch(epoch) # Required for shuffle to work correctly
|
|
194
|
+
train_epoch(model, loader, ...)
|
|
195
|
+
|
|
196
|
+
# Save only from rank 0
|
|
197
|
+
if rank == 0:
|
|
198
|
+
torch.save(model.module.state_dict(), "model.pt") # .module unwraps DDP
|
|
199
|
+
|
|
200
|
+
dist.destroy_process_group()
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
**DDP best practices**:
|
|
204
|
+
- Use `torchrun` (not `torch.multiprocessing.spawn`) for launch
|
|
205
|
+
- Effective batch size = `batch_size_per_gpu × world_size` — scale learning rate accordingly (linear scaling rule)
|
|
206
|
+
- Always call `sampler.set_epoch(epoch)` or shuffle is deterministic across epochs
|
|
207
|
+
- Log and save only from rank 0
|
|
208
|
+
|
|
209
|
+
### Distributed Training: FSDP
|
|
210
|
+
|
|
211
|
+
Fully Sharded Data Parallel (FSDP) shards model parameters across GPUs, enabling training of models too large for a single GPU:
|
|
212
|
+
|
|
213
|
+
```python
|
|
214
|
+
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
|
|
215
|
+
from torch.distributed.fsdp import MixedPrecision
|
|
216
|
+
import torch
|
|
217
|
+
|
|
218
|
+
bf16_policy = MixedPrecision(
|
|
219
|
+
param_dtype=torch.bfloat16,
|
|
220
|
+
reduce_dtype=torch.bfloat16,
|
|
221
|
+
buffer_dtype=torch.bfloat16,
|
|
222
|
+
)
|
|
223
|
+
|
|
224
|
+
model = FSDP(
|
|
225
|
+
model,
|
|
226
|
+
mixed_precision=bf16_policy,
|
|
227
|
+
auto_wrap_policy=transformer_auto_wrap_policy, # Wrap each transformer layer
|
|
228
|
+
)
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
Use FSDP when model parameters exceed single-GPU memory. Use DDP when the model fits on one GPU — DDP is simpler and has less communication overhead.
|
|
232
|
+
|
|
233
|
+
### Hyperparameter Tuning
|
|
234
|
+
|
|
235
|
+
Never tune hyperparameters manually at scale. Use a systematic search tool:
|
|
236
|
+
|
|
237
|
+
**Optuna** (open source, flexible):
|
|
238
|
+
```python
|
|
239
|
+
import optuna
|
|
240
|
+
|
|
241
|
+
def objective(trial: optuna.Trial) -> float:
|
|
242
|
+
lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)
|
|
243
|
+
batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
|
|
244
|
+
dropout = trial.suggest_float("dropout", 0.0, 0.5)
|
|
245
|
+
|
|
246
|
+
model = build_model(dropout=dropout)
|
|
247
|
+
val_loss = train_and_evaluate(model, lr=lr, batch_size=batch_size)
|
|
248
|
+
return val_loss # Optuna minimises by default
|
|
249
|
+
|
|
250
|
+
study = optuna.create_study(direction="minimize", sampler=optuna.samplers.TPESampler())
|
|
251
|
+
study.optimize(objective, n_trials=50, n_jobs=4)
|
|
252
|
+
|
|
253
|
+
print(f"Best params: {study.best_params}")
|
|
254
|
+
print(f"Best value: {study.best_value:.4f}")
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
**Key hyperparameters to tune** (in order of impact):
|
|
258
|
+
1. Learning rate (most impactful — always tune first)
|
|
259
|
+
2. Batch size (affects generalisation and training speed)
|
|
260
|
+
3. Architecture (model size, depth, width)
|
|
261
|
+
4. Regularisation (dropout, weight decay)
|
|
262
|
+
5. Learning rate schedule (warmup steps, decay type)
|
|
263
|
+
|
|
264
|
+
**Search strategies**:
|
|
265
|
+
- Random search: Surprisingly effective, easy to parallelise
|
|
266
|
+
- Bayesian optimisation (TPE in Optuna): More efficient for small budgets
|
|
267
|
+
- Grid search: Only for 1–2 hyperparameters with small ranges
|
|
268
|
+
|
|
269
|
+
Report the best result with multiple seeds (mean ± std) — a single seed result may be a lucky draw.
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# methodology/browser-extension-overlay.yml
|
|
2
|
+
name: browser-extension
|
|
3
|
+
description: >
|
|
4
|
+
Browser extension overlay — injects extension domain knowledge into existing
|
|
5
|
+
pipeline steps for manifest configuration, content scripts, service workers,
|
|
6
|
+
cross-browser compatibility, store submission, and security patterns.
|
|
7
|
+
project-type: browser-extension
|
|
8
|
+
|
|
9
|
+
# ---------------------------------------------------------------------------
|
|
10
|
+
# knowledge-overrides
|
|
11
|
+
# ---------------------------------------------------------------------------
|
|
12
|
+
# Map browser-extension knowledge entries into existing pipeline steps so that
|
|
13
|
+
# browser extension domain expertise is injected during prompt assembly.
|
|
14
|
+
# Includes UX/design steps (extensions have UI).
|
|
15
|
+
# Maps: manifest → tech-stack + coding-standards; content-scripts → security;
|
|
16
|
+
# service-workers → system-architecture; cross-browser → tdd + add-e2e-testing;
|
|
17
|
+
# store-submission → operations.
|
|
18
|
+
knowledge-overrides:
|
|
19
|
+
# Foundational
|
|
20
|
+
create-prd:
|
|
21
|
+
append: [browser-extension-requirements]
|
|
22
|
+
user-stories:
|
|
23
|
+
append: [browser-extension-requirements]
|
|
24
|
+
# manifest → coding-standards
|
|
25
|
+
coding-standards:
|
|
26
|
+
append: [browser-extension-conventions, browser-extension-manifest]
|
|
27
|
+
project-structure:
|
|
28
|
+
append: [browser-extension-project-structure]
|
|
29
|
+
dev-env-setup:
|
|
30
|
+
append: [browser-extension-dev-environment]
|
|
31
|
+
git-workflow:
|
|
32
|
+
append: [browser-extension-conventions]
|
|
33
|
+
|
|
34
|
+
# Architecture & Design
|
|
35
|
+
# service-workers → system-architecture
|
|
36
|
+
system-architecture:
|
|
37
|
+
append: [browser-extension-architecture, browser-extension-service-workers]
|
|
38
|
+
# manifest → tech-stack
|
|
39
|
+
tech-stack:
|
|
40
|
+
append: [browser-extension-architecture, browser-extension-manifest]
|
|
41
|
+
adrs:
|
|
42
|
+
append: [browser-extension-architecture]
|
|
43
|
+
domain-modeling:
|
|
44
|
+
append: [browser-extension-architecture]
|
|
45
|
+
# UX/design steps (extensions have UI)
|
|
46
|
+
ux-spec:
|
|
47
|
+
append: [browser-extension-architecture]
|
|
48
|
+
design-system:
|
|
49
|
+
append: [browser-extension-conventions]
|
|
50
|
+
# content-scripts → security
|
|
51
|
+
security:
|
|
52
|
+
append: [browser-extension-security, browser-extension-content-scripts]
|
|
53
|
+
# store-submission → operations
|
|
54
|
+
operations:
|
|
55
|
+
append: [browser-extension-store-submission]
|
|
56
|
+
|
|
57
|
+
# Testing
|
|
58
|
+
# cross-browser → tdd + add-e2e-testing
|
|
59
|
+
tdd:
|
|
60
|
+
append: [browser-extension-testing, browser-extension-cross-browser]
|
|
61
|
+
add-e2e-testing:
|
|
62
|
+
append: [browser-extension-testing, browser-extension-cross-browser]
|
|
63
|
+
create-evals:
|
|
64
|
+
append: [browser-extension-testing]
|
|
65
|
+
story-tests:
|
|
66
|
+
append: [browser-extension-testing]
|
|
67
|
+
|
|
68
|
+
# Reviews (mirror authoring steps)
|
|
69
|
+
review-architecture:
|
|
70
|
+
append: [browser-extension-architecture, browser-extension-service-workers]
|
|
71
|
+
review-ux:
|
|
72
|
+
append: [browser-extension-architecture]
|
|
73
|
+
review-security:
|
|
74
|
+
append: [browser-extension-security, browser-extension-content-scripts]
|
|
75
|
+
review-operations:
|
|
76
|
+
append: [browser-extension-store-submission]
|
|
77
|
+
review-testing:
|
|
78
|
+
append: [browser-extension-testing, browser-extension-cross-browser]
|
|
79
|
+
|
|
80
|
+
# Planning
|
|
81
|
+
implementation-plan:
|
|
82
|
+
append: [browser-extension-architecture]
|