@zigrivers/scaffold 3.8.0 → 3.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/README.md +73 -8
  2. package/content/knowledge/browser-extension/browser-extension-architecture.md +195 -0
  3. package/content/knowledge/browser-extension/browser-extension-content-scripts.md +264 -0
  4. package/content/knowledge/browser-extension/browser-extension-conventions.md +156 -0
  5. package/content/knowledge/browser-extension/browser-extension-cross-browser.md +229 -0
  6. package/content/knowledge/browser-extension/browser-extension-dev-environment.md +247 -0
  7. package/content/knowledge/browser-extension/browser-extension-manifest.md +220 -0
  8. package/content/knowledge/browser-extension/browser-extension-project-structure.md +183 -0
  9. package/content/knowledge/browser-extension/browser-extension-requirements.md +107 -0
  10. package/content/knowledge/browser-extension/browser-extension-security.md +202 -0
  11. package/content/knowledge/browser-extension/browser-extension-service-workers.md +265 -0
  12. package/content/knowledge/browser-extension/browser-extension-store-submission.md +155 -0
  13. package/content/knowledge/browser-extension/browser-extension-testing.md +270 -0
  14. package/content/knowledge/data-pipeline/data-pipeline-architecture.md +175 -0
  15. package/content/knowledge/data-pipeline/data-pipeline-batch-patterns.md +263 -0
  16. package/content/knowledge/data-pipeline/data-pipeline-conventions.md +176 -0
  17. package/content/knowledge/data-pipeline/data-pipeline-dev-environment.md +350 -0
  18. package/content/knowledge/data-pipeline/data-pipeline-orchestration.md +291 -0
  19. package/content/knowledge/data-pipeline/data-pipeline-project-structure.md +257 -0
  20. package/content/knowledge/data-pipeline/data-pipeline-quality.md +324 -0
  21. package/content/knowledge/data-pipeline/data-pipeline-requirements.md +145 -0
  22. package/content/knowledge/data-pipeline/data-pipeline-schema-management.md +295 -0
  23. package/content/knowledge/data-pipeline/data-pipeline-security.md +326 -0
  24. package/content/knowledge/data-pipeline/data-pipeline-streaming-patterns.md +280 -0
  25. package/content/knowledge/data-pipeline/data-pipeline-testing.md +406 -0
  26. package/content/knowledge/ml/ml-architecture.md +172 -0
  27. package/content/knowledge/ml/ml-conventions.md +209 -0
  28. package/content/knowledge/ml/ml-dev-environment.md +299 -0
  29. package/content/knowledge/ml/ml-experiment-tracking.md +285 -0
  30. package/content/knowledge/ml/ml-model-evaluation.md +256 -0
  31. package/content/knowledge/ml/ml-observability.md +253 -0
  32. package/content/knowledge/ml/ml-project-structure.md +216 -0
  33. package/content/knowledge/ml/ml-requirements.md +138 -0
  34. package/content/knowledge/ml/ml-security.md +188 -0
  35. package/content/knowledge/ml/ml-serving-patterns.md +243 -0
  36. package/content/knowledge/ml/ml-testing.md +301 -0
  37. package/content/knowledge/ml/ml-training-patterns.md +269 -0
  38. package/content/methodology/browser-extension-overlay.yml +82 -0
  39. package/content/methodology/data-pipeline-overlay.yml +70 -0
  40. package/content/methodology/ml-overlay.yml +70 -0
  41. package/dist/cli/commands/init.d.ts +13 -0
  42. package/dist/cli/commands/init.d.ts.map +1 -1
  43. package/dist/cli/commands/init.js +122 -2
  44. package/dist/cli/commands/init.js.map +1 -1
  45. package/dist/cli/commands/init.test.js +120 -0
  46. package/dist/cli/commands/init.test.js.map +1 -1
  47. package/dist/config/schema.d.ts +864 -48
  48. package/dist/config/schema.d.ts.map +1 -1
  49. package/dist/config/schema.js +53 -0
  50. package/dist/config/schema.js.map +1 -1
  51. package/dist/config/schema.test.js +166 -3
  52. package/dist/config/schema.test.js.map +1 -1
  53. package/dist/core/assembly/overlay-loader.test.js +33 -0
  54. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  55. package/dist/e2e/project-type-overlays.test.d.ts +2 -2
  56. package/dist/e2e/project-type-overlays.test.js +499 -33
  57. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  58. package/dist/types/config.d.ts +10 -1
  59. package/dist/types/config.d.ts.map +1 -1
  60. package/dist/wizard/questions.d.ts +17 -1
  61. package/dist/wizard/questions.d.ts.map +1 -1
  62. package/dist/wizard/questions.js +75 -1
  63. package/dist/wizard/questions.js.map +1 -1
  64. package/dist/wizard/questions.test.js +167 -0
  65. package/dist/wizard/questions.test.js.map +1 -1
  66. package/dist/wizard/wizard.d.ts +13 -0
  67. package/dist/wizard/wizard.d.ts.map +1 -1
  68. package/dist/wizard/wizard.js +17 -1
  69. package/dist/wizard/wizard.js.map +1 -1
  70. package/package.json +1 -1
@@ -0,0 +1,301 @@
1
+ ---
2
+ name: ml-testing
3
+ description: Unit tests for data transforms, tolerance-based model tests, pipeline integration tests, and regression tests for ML systems
4
+ topics: [ml, testing, unit-tests, model-tests, pipeline-tests, regression-tests, tdd]
5
+ ---
6
+
7
+ ML code is tested less rigorously than traditional software because "the model is probabilistic" feels like an excuse for skipping tests. It is not. The vast majority of ML code — data transforms, preprocessing, feature engineering, postprocessing, and serving logic — is deterministic and must be unit tested. The probabilistic parts — model weights and accuracy — require tolerance-based tests and regression baselines. Untested ML pipelines fail silently in ways that are expensive to diagnose in production.
8
+
9
+ ## Summary
10
+
11
+ Test ML systems at four levels: unit tests for deterministic components (transforms, metrics, preprocessing), model tests using tolerance-based assertions (output shape, value range, basic accuracy on canonical examples), pipeline tests (end-to-end training and inference on small data), and regression tests (compare new model against production baseline). Use `pytest` with `torch.testing` and `numpy.testing` for numerical assertions. Run tests in CI on every commit.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### What to Test in ML
16
+
17
+ **Always unit test**:
18
+ - Data loading and preprocessing transforms
19
+ - Feature engineering functions
20
+ - Custom loss functions
21
+ - Metric computation functions
22
+ - Postprocessing logic (thresholding, calibration)
23
+ - Model architecture components (custom layers, attention mechanisms)
24
+
25
+ **Always model test** (tolerance-based):
26
+ - Model output shape matches expected shape
27
+ - Output values are in valid range (probabilities sum to 1, logits are finite)
28
+ - Model forward pass runs without error
29
+ - Model handles edge cases (empty input, max-length input, all-zero input)
30
+ - Basic sanity check: model achieves above-chance accuracy on a canonical small dataset
31
+
32
+ **Always pipeline test**:
33
+ - Full training pipeline runs on a tiny dataset without error
34
+ - Checkpoint save and load produces identical predictions
35
+ - Inference pipeline produces output in the correct format
36
+
37
+ **Always regression test**:
38
+ - New model version's accuracy on held-out test set does not regress beyond a threshold vs. the current production baseline
39
+
40
+ ### Unit Tests for Data Transforms
41
+
42
+ ```python
43
+ # tests/test_transforms.py
44
+ import pytest
45
+ import numpy as np
46
+ import torch
47
+ from src.data.transforms import (
48
+ Normalizer,
49
+ TextTokenizer,
50
+ ImageAugmenter,
51
+ )
52
+
53
+ class TestNormalizer:
54
+ def test_zero_mean(self):
55
+ """Normalized features should have near-zero mean on training data."""
56
+ X = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
57
+ norm = Normalizer()
58
+ norm.fit(X)
59
+ X_norm = norm.transform(X)
60
+ np.testing.assert_allclose(X_norm.mean(axis=0), 0.0, atol=1e-6)
61
+
62
+ def test_unit_std(self):
63
+ """Normalized features should have unit standard deviation."""
64
+ X = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
65
+ norm = Normalizer()
66
+ norm.fit(X)
67
+ X_norm = norm.transform(X)
68
+ np.testing.assert_allclose(X_norm.std(axis=0), 1.0, atol=1e-6)
69
+
70
+ def test_transform_without_fit_raises(self):
71
+ """Transform before fit must raise a clear error."""
72
+ norm = Normalizer()
73
+ with pytest.raises(RuntimeError, match="fit"):
74
+ norm.transform(np.array([[1.0, 2.0]]))
75
+
76
+ def test_inverse_transform_roundtrip(self):
77
+ """fit + transform + inverse_transform should return original values."""
78
+ X = np.random.rand(100, 5) * 10.0
79
+ norm = Normalizer()
80
+ norm.fit(X)
81
+ X_rt = norm.inverse_transform(norm.transform(X))
82
+ np.testing.assert_allclose(X_rt, X, rtol=1e-5)
83
+
84
+ def test_no_fit_leakage_to_test_data(self):
85
+ """Test data stats must not affect normalisation parameters."""
86
+ X_train = np.array([[1.0], [2.0], [3.0]])
87
+ X_test = np.array([[100.0], [200.0]]) # Very different distribution
88
+ norm = Normalizer()
89
+ norm.fit(X_train)
90
+ X_test_norm = norm.transform(X_test)
91
+ # Test data should be normalised using TRAINING stats only
92
+ assert np.all(np.abs(X_test_norm) > 1.0) # Large because distribution is different
93
+
94
+
95
+ class TestTextTokenizer:
96
+ def test_output_shape(self):
97
+ """Tokenizer must produce correct sequence length."""
98
+ tokenizer = TextTokenizer(max_length=128)
99
+ result = tokenizer("Hello world, this is a test.")
100
+ assert result["input_ids"].shape == (128,)
101
+ assert result["attention_mask"].shape == (128,)
102
+
103
+ def test_truncation(self):
104
+ """Long inputs must be truncated to max_length."""
105
+ tokenizer = TextTokenizer(max_length=8)
106
+ long_text = " ".join(["word"] * 100)
107
+ result = tokenizer(long_text)
108
+ assert result["input_ids"].shape == (8,)
109
+
110
+ def test_empty_input(self):
111
+ """Empty string must not raise an exception."""
112
+ tokenizer = TextTokenizer(max_length=128)
113
+ result = tokenizer("")
114
+ assert result["input_ids"].shape == (128,)
115
+ # All tokens after CLS should be PAD
116
+ assert result["attention_mask"].sum() <= 2 # Only CLS and/or SEP attended
117
+ ```
118
+
119
+ ### Model Tests
120
+
121
+ ```python
122
+ # tests/test_model.py
123
+ import pytest
124
+ import torch
125
+ import torch.nn.functional as F
126
+ from src.models.classifier import TextClassifier
127
+
128
+ @pytest.fixture
129
+ def model():
130
+ return TextClassifier(vocab_size=1000, hidden_dim=64, num_classes=3)
131
+
132
+ @pytest.fixture
133
+ def batch():
134
+ return {
135
+ "input_ids": torch.randint(0, 1000, (4, 128)),
136
+ "attention_mask": torch.ones(4, 128, dtype=torch.long),
137
+ }
138
+
139
+ class TestTextClassifier:
140
+ def test_output_shape(self, model, batch):
141
+ """Model output shape must match (batch_size, num_classes)."""
142
+ output = model(**batch)
143
+ assert output.shape == (4, 3)
144
+
145
+ def test_output_finite(self, model, batch):
146
+ """Model output must not contain NaN or Inf."""
147
+ output = model(**batch)
148
+ assert torch.all(torch.isfinite(output)), "Model output contains NaN or Inf"
149
+
150
+ def test_probabilities_sum_to_one(self, model, batch):
151
+ """Softmax probabilities must sum to 1."""
152
+ logits = model(**batch)
153
+ probs = F.softmax(logits, dim=-1)
154
+ torch.testing.assert_close(
155
+ probs.sum(dim=-1),
156
+ torch.ones(4),
157
+ atol=1e-5,
158
+ rtol=1e-5,
159
+ )
160
+
161
+ def test_different_inputs_different_outputs(self, model):
162
+ """Different inputs must produce different outputs (model is not constant)."""
163
+ batch_a = {"input_ids": torch.zeros(2, 128, dtype=torch.long),
164
+ "attention_mask": torch.ones(2, 128, dtype=torch.long)}
165
+ batch_b = {"input_ids": torch.ones(2, 128, dtype=torch.long),
166
+ "attention_mask": torch.ones(2, 128, dtype=torch.long)}
167
+ output_a = model(**batch_a)
168
+ output_b = model(**batch_b)
169
+ assert not torch.allclose(output_a, output_b), "Model outputs identical for different inputs"
170
+
171
+ def test_eval_mode_deterministic(self, model, batch):
172
+ """Same input in eval mode must produce identical outputs (no dropout randomness)."""
173
+ model.eval()
174
+ with torch.no_grad():
175
+ output_1 = model(**batch)
176
+ output_2 = model(**batch)
177
+ torch.testing.assert_close(output_1, output_2)
178
+
179
+ def test_gradient_flows(self, model, batch):
180
+ """Gradients must flow to all parameters during backward pass."""
181
+ model.train()
182
+ logits = model(**batch)
183
+ loss = logits.sum()
184
+ loss.backward()
185
+ for name, param in model.named_parameters():
186
+ if param.requires_grad:
187
+ assert param.grad is not None, f"No gradient for parameter: {name}"
188
+ assert torch.any(param.grad != 0), f"Zero gradient for parameter: {name}"
189
+ ```
190
+
191
+ ### Pipeline Tests
192
+
193
+ ```python
194
+ # tests/test_pipeline.py
195
+ import pytest
196
+ import tempfile
197
+ import os
198
+ from omegaconf import OmegaConf
199
+ from src.training.trainer import Trainer
200
+
201
+ @pytest.fixture
202
+ def tiny_config():
203
+ """Minimal config for fast pipeline smoke test."""
204
+ return OmegaConf.create({
205
+ "training": {"epochs": 2, "batch_size": 4, "seed": 42},
206
+ "optimizer": {"type": "adam", "lr": 1e-3},
207
+ "data": {"num_samples": 32}, # Tiny dataset
208
+ })
209
+
210
+ class TestTrainingPipeline:
211
+ def test_training_runs_without_error(self, tiny_config, tmp_path):
212
+ """Full training pipeline must complete without error on tiny data."""
213
+ trainer = Trainer(cfg=tiny_config, output_dir=str(tmp_path))
214
+ result = trainer.fit()
215
+ assert "val_loss" in result
216
+ assert result["val_loss"] < float("inf")
217
+
218
+ def test_checkpoint_saves_and_loads(self, tiny_config, tmp_path):
219
+ """Checkpoint must be saved and restored with identical predictions."""
220
+ trainer = Trainer(cfg=tiny_config, output_dir=str(tmp_path))
221
+ trainer.fit()
222
+
223
+ checkpoint_path = tmp_path / "best.pt"
224
+ assert checkpoint_path.exists(), "Checkpoint was not saved"
225
+
226
+ # Load checkpoint and verify predictions are identical
227
+ import torch
228
+ from src.models.classifier import TextClassifier
229
+ model_a = trainer.model
230
+ model_b = TextClassifier.from_checkpoint(str(checkpoint_path))
231
+
232
+ test_input = torch.randint(0, 1000, (2, 128))
233
+ model_a.eval()
234
+ model_b.eval()
235
+ with torch.no_grad():
236
+ torch.testing.assert_close(model_a(test_input), model_b(test_input))
237
+
238
+ def test_inference_pipeline_output_format(self, tiny_config, tmp_path):
239
+ """Inference pipeline must return predictions in expected format."""
240
+ trainer = Trainer(cfg=tiny_config, output_dir=str(tmp_path))
241
+ trainer.fit()
242
+
243
+ from src.serving.predictor import Predictor
244
+ predictor = Predictor(str(tmp_path / "best.pt"))
245
+ result = predictor.predict({"text": "test input"})
246
+
247
+ assert hasattr(result, "prediction")
248
+ assert hasattr(result, "confidence")
249
+ assert 0.0 <= result.confidence <= 1.0
250
+ ```
251
+
252
+ ### Regression Tests
253
+
254
+ ```python
255
+ # tests/test_regression.py
256
+ """
257
+ Regression tests compare a new model version against the production baseline.
258
+ Run these before promoting any model to staging.
259
+ """
260
+ import pytest
261
+ import numpy as np
262
+ from src.evaluation.evaluator import evaluate_model
263
+ from src.models.classifier import TextClassifier
264
+
265
+ PRODUCTION_BASELINE = {
266
+ "accuracy": 0.872,
267
+ "f1": 0.864,
268
+ "roc_auc": 0.934,
269
+ }
270
+ REGRESSION_TOLERANCE = 0.02 # Allow up to 2pp regression
271
+
272
+ class TestModelRegression:
273
+ @pytest.fixture(scope="class")
274
+ def candidate_metrics(self, holdout_dataset):
275
+ """Evaluate the candidate model on the holdout set."""
276
+ model = TextClassifier.from_registry("candidate")
277
+ return evaluate_model(model, holdout_dataset)
278
+
279
+ def test_accuracy_no_regression(self, candidate_metrics):
280
+ threshold = PRODUCTION_BASELINE["accuracy"] - REGRESSION_TOLERANCE
281
+ assert candidate_metrics["accuracy"] >= threshold, (
282
+ f"Accuracy regression: {candidate_metrics['accuracy']:.3f} < {threshold:.3f}"
283
+ )
284
+
285
+ def test_f1_no_regression(self, candidate_metrics):
286
+ threshold = PRODUCTION_BASELINE["f1"] - REGRESSION_TOLERANCE
287
+ assert candidate_metrics["f1"] >= threshold
288
+
289
+ def test_roc_auc_no_regression(self, candidate_metrics):
290
+ threshold = PRODUCTION_BASELINE["roc_auc"] - REGRESSION_TOLERANCE
291
+ assert candidate_metrics["roc_auc"] >= threshold
292
+ ```
293
+
294
+ ### Testing Best Practices for ML
295
+
296
+ - **Test data must not touch training data**: Use a separate fixture dataset for tests, not samples from the training set
297
+ - **Tests must be fast**: Unit and model tests must run in < 10 seconds total; use tiny models and tiny data
298
+ - **Parametrize for edge cases**: Use `@pytest.mark.parametrize` to test multiple input types (empty, max-length, all-zeros, all-ones)
299
+ - **Numerical precision**: Use `rtol`/`atol` tolerances in `numpy.testing.assert_allclose` and `torch.testing.assert_close` — never use `==` for floats
300
+ - **Mock heavy dependencies**: Mock database connections, S3 calls, and MLflow logging in unit tests — tests must not require external services to run
301
+ - **CI enforcement**: Run `pytest tests/` in CI on every commit; block PRs that break tests
@@ -0,0 +1,269 @@
1
+ ---
2
+ name: ml-training-patterns
3
+ description: Data loaders, training loops, distributed training with DDP and FSDP, checkpointing strategies, and hyperparameter tuning patterns
4
+ topics: [ml, training, data-loaders, distributed-training, ddp, fsdp, checkpointing, hyperparameter-tuning]
5
+ ---
6
+
7
+ The training loop is the heart of every ML project, but it is also where most bugs hide: data leaking between splits, gradients not zeroed, mixed precision overflows, checkpoints saved incorrectly, and distributed training hanging on a single slow worker. These are not exotic edge cases — they are the standard bugs that every ML engineer encounters. A well-structured training pipeline prevents them through clear separation of concerns and defensive coding.
8
+
9
+ ## Summary
10
+
11
+ Build training pipelines with properly configured data loaders (worker count, pinned memory, prefetch), clean training loops with explicit gradient management, mixed precision for efficiency, and robust checkpointing. For large models or large datasets, use PyTorch DDP for multi-GPU training or FSDP for models too large to fit on a single GPU. Manage hyperparameter search with a systematic tool (Optuna, Ray Tune, W&B Sweeps) rather than manual iteration.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Data Loaders
16
+
17
+ `torch.utils.data.DataLoader` is the standard interface for batched data loading. Configure it correctly:
18
+
19
+ ```python
20
+ from torch.utils.data import DataLoader
21
+ from src.data.dataset import MyDataset
22
+
23
+ def build_dataloader(
24
+ dataset: MyDataset,
25
+ batch_size: int,
26
+ split: str,
27
+ num_workers: int = 4,
28
+ ) -> DataLoader:
29
+ is_train = split == "train"
30
+ return DataLoader(
31
+ dataset,
32
+ batch_size=batch_size,
33
+ shuffle=is_train, # Shuffle only training data
34
+ num_workers=num_workers, # Parallel data loading workers
35
+ pin_memory=True, # Pin CPU memory for faster GPU transfer
36
+ prefetch_factor=2, # Prefetch 2 batches per worker
37
+ persistent_workers=True, # Keep workers alive between epochs
38
+ drop_last=is_train, # Drop incomplete final batch (training only)
39
+ )
40
+ ```
41
+
42
+ **`num_workers` guidance**:
43
+ - Start with `min(os.cpu_count(), 8)` and tune from there
44
+ - Set to 0 for debugging (single-process, easier stack traces)
45
+ - On Windows, set to 0 if you encounter multiprocessing issues
46
+ - Bottleneck check: if GPU utilisation < 80%, increase workers or enable prefetch
47
+
48
+ **Common data loader bugs**:
49
+ - Using `shuffle=True` on validation/test sets (breaks reproducibility checks)
50
+ - Not setting `worker_init_fn` when using random augmentation in workers (workers share the same seed without this)
51
+ - `pin_memory=True` on a machine without GPU (no-op but wastes memory)
52
+
53
+ ### Training Loop Structure
54
+
55
+ ```python
56
+ def train_epoch(
57
+ model: nn.Module,
58
+ loader: DataLoader,
59
+ optimizer: torch.optim.Optimizer,
60
+ criterion: nn.Module,
61
+ scaler: torch.cuda.amp.GradScaler,
62
+ device: torch.device,
63
+ ) -> dict[str, float]:
64
+ model.train()
65
+ total_loss = 0.0
66
+ n_batches = 0
67
+
68
+ for batch in loader:
69
+ inputs, targets = batch
70
+ inputs = inputs.to(device, non_blocking=True)
71
+ targets = targets.to(device, non_blocking=True)
72
+
73
+ # Zero gradients BEFORE forward pass
74
+ optimizer.zero_grad(set_to_none=True) # Faster than zero_grad()
75
+
76
+ # Mixed precision forward pass
77
+ with torch.autocast(device_type="cuda", dtype=torch.float16):
78
+ outputs = model(inputs)
79
+ loss = criterion(outputs, targets)
80
+
81
+ # Scaled backward pass
82
+ scaler.scale(loss).backward()
83
+
84
+ # Gradient clipping (before unscaling)
85
+ scaler.unscale_(optimizer)
86
+ torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
87
+
88
+ # Optimizer step
89
+ scaler.step(optimizer)
90
+ scaler.update()
91
+
92
+ total_loss += loss.item()
93
+ n_batches += 1
94
+
95
+ return {"loss": total_loss / n_batches}
96
+ ```
97
+
98
+ **Critical training loop rules**:
99
+ 1. `model.train()` before training, `model.eval()` before evaluation — these affect BatchNorm and Dropout
100
+ 2. `optimizer.zero_grad()` at the start of each batch, not the end
101
+ 3. Clip gradients before the optimizer step
102
+ 4. Use `loss.item()` (not `loss`) when accumulating — `.item()` detaches from the computation graph
103
+
104
+ ### Mixed Precision Training
105
+
106
+ Mixed precision (float16/bfloat16 for computation, float32 for parameters) typically provides 2–3x speedup on modern GPUs with minimal accuracy impact:
107
+
108
+ ```python
109
+ # Setup
110
+ scaler = torch.cuda.amp.GradScaler()
111
+
112
+ # Training step (shown above)
113
+ with torch.autocast(device_type="cuda", dtype=torch.float16):
114
+ outputs = model(inputs)
115
+ loss = criterion(outputs, targets)
116
+
117
+ scaler.scale(loss).backward()
118
+ scaler.unscale_(optimizer)
119
+ torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
120
+ scaler.step(optimizer)
121
+ scaler.update()
122
+ ```
123
+
124
+ **bfloat16 vs float16**:
125
+ - `bfloat16`: Same dynamic range as float32, less precision. Better for training stability. Requires Ampere GPU (A100, A30, RTX 30xx) or newer.
126
+ - `float16`: Better precision than bfloat16 but narrower dynamic range (overflow risk). Works on all CUDA GPUs.
127
+ - Default to `bfloat16` on Ampere+, `float16` on older GPUs.
128
+
129
+ ### Checkpointing
130
+
131
+ Save and restore training state completely — not just model weights:
132
+
133
+ ```python
134
+ def save_checkpoint(
135
+ path: str,
136
+ model: nn.Module,
137
+ optimizer: torch.optim.Optimizer,
138
+ scheduler,
139
+ scaler: torch.cuda.amp.GradScaler,
140
+ epoch: int,
141
+ metrics: dict,
142
+ ) -> None:
143
+ torch.save({
144
+ "epoch": epoch,
145
+ "model_state_dict": model.state_dict(),
146
+ "optimizer_state_dict": optimizer.state_dict(),
147
+ "scheduler_state_dict": scheduler.state_dict(),
148
+ "scaler_state_dict": scaler.state_dict(),
149
+ "metrics": metrics,
150
+ }, path)
151
+
152
+ def load_checkpoint(path: str, model, optimizer, scheduler, scaler):
153
+ checkpoint = torch.load(path, map_location="cpu")
154
+ model.load_state_dict(checkpoint["model_state_dict"])
155
+ optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
156
+ scheduler.load_state_dict(checkpoint["scheduler_state_dict"])
157
+ scaler.load_state_dict(checkpoint["scaler_state_dict"])
158
+ return checkpoint["epoch"], checkpoint["metrics"]
159
+ ```
160
+
161
+ **Checkpoint strategy**:
162
+ - Save every N epochs AND on best validation metric (two separate files)
163
+ - Keep last K checkpoints (delete older ones to save disk)
164
+ - Always test checkpoint resume — bugs in resume code are discovered in production during long training runs, not in testing
165
+
166
+ ### Distributed Training: DDP
167
+
168
+ PyTorch DistributedDataParallel (DDP) is the standard for multi-GPU training. Each GPU runs an independent process with a full model copy; gradients are averaged across GPUs after each backward pass:
169
+
170
+ ```python
171
+ # Launch: torchrun --nproc_per_node=4 train.py
172
+ import torch.distributed as dist
173
+ from torch.nn.parallel import DistributedDataParallel as DDP
174
+ from torch.utils.data.distributed import DistributedSampler
175
+
176
+ def train_distributed():
177
+ dist.init_process_group(backend="nccl")
178
+ rank = dist.get_rank()
179
+ local_rank = int(os.environ["LOCAL_RANK"])
180
+ world_size = dist.get_world_size()
181
+
182
+ device = torch.device(f"cuda:{local_rank}")
183
+ torch.cuda.set_device(device)
184
+
185
+ model = MyModel().to(device)
186
+ model = DDP(model, device_ids=[local_rank])
187
+
188
+ # Each rank gets a different data partition
189
+ sampler = DistributedSampler(dataset, num_replicas=world_size, rank=rank)
190
+ loader = DataLoader(dataset, sampler=sampler, batch_size=batch_size_per_gpu)
191
+
192
+ for epoch in range(epochs):
193
+ sampler.set_epoch(epoch) # Required for shuffle to work correctly
194
+ train_epoch(model, loader, ...)
195
+
196
+ # Save only from rank 0
197
+ if rank == 0:
198
+ torch.save(model.module.state_dict(), "model.pt") # .module unwraps DDP
199
+
200
+ dist.destroy_process_group()
201
+ ```
202
+
203
+ **DDP best practices**:
204
+ - Use `torchrun` (not `torch.multiprocessing.spawn`) for launch
205
+ - Effective batch size = `batch_size_per_gpu × world_size` — scale learning rate accordingly (linear scaling rule)
206
+ - Always call `sampler.set_epoch(epoch)` or shuffle is deterministic across epochs
207
+ - Log and save only from rank 0
208
+
209
+ ### Distributed Training: FSDP
210
+
211
+ Fully Sharded Data Parallel (FSDP) shards model parameters across GPUs, enabling training of models too large for a single GPU:
212
+
213
+ ```python
214
+ from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
215
+ from torch.distributed.fsdp import MixedPrecision
216
+ import torch
217
+
218
+ bf16_policy = MixedPrecision(
219
+ param_dtype=torch.bfloat16,
220
+ reduce_dtype=torch.bfloat16,
221
+ buffer_dtype=torch.bfloat16,
222
+ )
223
+
224
+ model = FSDP(
225
+ model,
226
+ mixed_precision=bf16_policy,
227
+ auto_wrap_policy=transformer_auto_wrap_policy, # Wrap each transformer layer
228
+ )
229
+ ```
230
+
231
+ Use FSDP when model parameters exceed single-GPU memory. Use DDP when the model fits on one GPU — DDP is simpler and has less communication overhead.
232
+
233
+ ### Hyperparameter Tuning
234
+
235
+ Never tune hyperparameters manually at scale. Use a systematic search tool:
236
+
237
+ **Optuna** (open source, flexible):
238
+ ```python
239
+ import optuna
240
+
241
+ def objective(trial: optuna.Trial) -> float:
242
+ lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)
243
+ batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])
244
+ dropout = trial.suggest_float("dropout", 0.0, 0.5)
245
+
246
+ model = build_model(dropout=dropout)
247
+ val_loss = train_and_evaluate(model, lr=lr, batch_size=batch_size)
248
+ return val_loss # Optuna minimises by default
249
+
250
+ study = optuna.create_study(direction="minimize", sampler=optuna.samplers.TPESampler())
251
+ study.optimize(objective, n_trials=50, n_jobs=4)
252
+
253
+ print(f"Best params: {study.best_params}")
254
+ print(f"Best value: {study.best_value:.4f}")
255
+ ```
256
+
257
+ **Key hyperparameters to tune** (in order of impact):
258
+ 1. Learning rate (most impactful — always tune first)
259
+ 2. Batch size (affects generalisation and training speed)
260
+ 3. Architecture (model size, depth, width)
261
+ 4. Regularisation (dropout, weight decay)
262
+ 5. Learning rate schedule (warmup steps, decay type)
263
+
264
+ **Search strategies**:
265
+ - Random search: Surprisingly effective, easy to parallelise
266
+ - Bayesian optimisation (TPE in Optuna): More efficient for small budgets
267
+ - Grid search: Only for 1–2 hyperparameters with small ranges
268
+
269
+ Report the best result with multiple seeds (mean ± std) — a single seed result may be a lucky draw.
@@ -0,0 +1,82 @@
1
+ # methodology/browser-extension-overlay.yml
2
+ name: browser-extension
3
+ description: >
4
+ Browser extension overlay — injects extension domain knowledge into existing
5
+ pipeline steps for manifest configuration, content scripts, service workers,
6
+ cross-browser compatibility, store submission, and security patterns.
7
+ project-type: browser-extension
8
+
9
+ # ---------------------------------------------------------------------------
10
+ # knowledge-overrides
11
+ # ---------------------------------------------------------------------------
12
+ # Map browser-extension knowledge entries into existing pipeline steps so that
13
+ # browser extension domain expertise is injected during prompt assembly.
14
+ # Includes UX/design steps (extensions have UI).
15
+ # Maps: manifest → tech-stack + coding-standards; content-scripts → security;
16
+ # service-workers → system-architecture; cross-browser → tdd + add-e2e-testing;
17
+ # store-submission → operations.
18
+ knowledge-overrides:
19
+ # Foundational
20
+ create-prd:
21
+ append: [browser-extension-requirements]
22
+ user-stories:
23
+ append: [browser-extension-requirements]
24
+ # manifest → coding-standards
25
+ coding-standards:
26
+ append: [browser-extension-conventions, browser-extension-manifest]
27
+ project-structure:
28
+ append: [browser-extension-project-structure]
29
+ dev-env-setup:
30
+ append: [browser-extension-dev-environment]
31
+ git-workflow:
32
+ append: [browser-extension-conventions]
33
+
34
+ # Architecture & Design
35
+ # service-workers → system-architecture
36
+ system-architecture:
37
+ append: [browser-extension-architecture, browser-extension-service-workers]
38
+ # manifest → tech-stack
39
+ tech-stack:
40
+ append: [browser-extension-architecture, browser-extension-manifest]
41
+ adrs:
42
+ append: [browser-extension-architecture]
43
+ domain-modeling:
44
+ append: [browser-extension-architecture]
45
+ # UX/design steps (extensions have UI)
46
+ ux-spec:
47
+ append: [browser-extension-architecture]
48
+ design-system:
49
+ append: [browser-extension-conventions]
50
+ # content-scripts → security
51
+ security:
52
+ append: [browser-extension-security, browser-extension-content-scripts]
53
+ # store-submission → operations
54
+ operations:
55
+ append: [browser-extension-store-submission]
56
+
57
+ # Testing
58
+ # cross-browser → tdd + add-e2e-testing
59
+ tdd:
60
+ append: [browser-extension-testing, browser-extension-cross-browser]
61
+ add-e2e-testing:
62
+ append: [browser-extension-testing, browser-extension-cross-browser]
63
+ create-evals:
64
+ append: [browser-extension-testing]
65
+ story-tests:
66
+ append: [browser-extension-testing]
67
+
68
+ # Reviews (mirror authoring steps)
69
+ review-architecture:
70
+ append: [browser-extension-architecture, browser-extension-service-workers]
71
+ review-ux:
72
+ append: [browser-extension-architecture]
73
+ review-security:
74
+ append: [browser-extension-security, browser-extension-content-scripts]
75
+ review-operations:
76
+ append: [browser-extension-store-submission]
77
+ review-testing:
78
+ append: [browser-extension-testing, browser-extension-cross-browser]
79
+
80
+ # Planning
81
+ implementation-plan:
82
+ append: [browser-extension-architecture]