@variantlab/core 0.1.4 → 0.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1209 -39
- package/docs/API.md +692 -0
- package/docs/ARCHITECTURE.md +430 -0
- package/docs/CONTRIBUTING.md +264 -0
- package/docs/ROADMAP.md +292 -0
- package/docs/SECURITY.md +323 -0
- package/docs/design/api-philosophy.md +347 -0
- package/docs/design/config-format.md +442 -0
- package/docs/design/design-principles.md +212 -0
- package/docs/design/targeting-dsl.md +433 -0
- package/docs/features/codegen.md +351 -0
- package/docs/features/crash-rollback.md +399 -0
- package/docs/features/debug-overlay.md +328 -0
- package/docs/features/hmac-signing.md +330 -0
- package/docs/features/killer-features.md +308 -0
- package/docs/features/multivariate.md +339 -0
- package/docs/features/qr-sharing.md +372 -0
- package/docs/features/targeting.md +481 -0
- package/docs/features/time-travel.md +306 -0
- package/docs/features/value-experiments.md +487 -0
- package/docs/phases/phase-2-expansion.md +307 -0
- package/docs/phases/phase-3-ecosystem.md +289 -0
- package/docs/phases/phase-4-advanced.md +306 -0
- package/docs/phases/phase-5-v1-stable.md +350 -0
- package/docs/research/bundle-size-analysis.md +279 -0
- package/docs/research/competitors.md +327 -0
- package/docs/research/framework-ssr-quirks.md +394 -0
- package/docs/research/naming-rationale.md +238 -0
- package/docs/research/origin-story.md +179 -0
- package/docs/research/security-threats.md +312 -0
- package/package.json +2 -1
|
@@ -0,0 +1,399 @@
|
|
|
1
|
+
# Crash-triggered automatic rollback
|
|
2
|
+
|
|
3
|
+
A variant that crashes should not keep crashing your users. variantlab's crash-rollback feature detects errors in a variant, clears the assignment, and forces the default.
|
|
4
|
+
|
|
5
|
+
## Table of contents
|
|
6
|
+
|
|
7
|
+
- [Why this exists](#why-this-exists)
|
|
8
|
+
- [How it works](#how-it-works)
|
|
9
|
+
- [Configuration](#configuration)
|
|
10
|
+
- [The VariantErrorBoundary](#the-varianterrorboundary)
|
|
11
|
+
- [Programmatic reporting](#programmatic-reporting)
|
|
12
|
+
- [Persistence](#persistence)
|
|
13
|
+
- [Events](#events)
|
|
14
|
+
- [Debugging a rollback](#debugging-a-rollback)
|
|
15
|
+
- [Limitations](#limitations)
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Why this exists
|
|
20
|
+
|
|
21
|
+
Real-world story from Drishtikon: we shipped a new card layout that crashed when a particular article had no hero image. The crash affected 100% of users in that variant until we shipped a new config.
|
|
22
|
+
|
|
23
|
+
With crash-rollback, the second crash would have automatically cleared the assignment and forced the default. No user would have seen more than one crash.
|
|
24
|
+
|
|
25
|
+
This is the safety net that lets teams ship risky experiments with confidence.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## How it works
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
33
|
+
│ Variant A │────▶│ ErrorBoundary│────▶│ reportCrash │
|
|
34
|
+
└─────────────┘ └──────────────┘ └──────────────┘
|
|
35
|
+
│
|
|
36
|
+
▼
|
|
37
|
+
┌──────────────────┐
|
|
38
|
+
│ Count crashes │
|
|
39
|
+
│ in time window │
|
|
40
|
+
└──────────────────┘
|
|
41
|
+
│
|
|
42
|
+
▼
|
|
43
|
+
┌──────────────────────────┐
|
|
44
|
+
│ Threshold reached? │
|
|
45
|
+
│ Yes → rollback │
|
|
46
|
+
│ No → keep trying │
|
|
47
|
+
└──────────────────────────┘
|
|
48
|
+
│
|
|
49
|
+
▼
|
|
50
|
+
┌──────────────────────────┐
|
|
51
|
+
│ Clear assignment │
|
|
52
|
+
│ Force default │
|
|
53
|
+
│ Emit onRollback event │
|
|
54
|
+
│ Persist (optional) │
|
|
55
|
+
└──────────────────────────┘
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
1. You wrap the variant in `<VariantErrorBoundary>`
|
|
59
|
+
2. A crash is caught and reported to the engine
|
|
60
|
+
3. The engine maintains a per-(experiment, variant, user) crash counter within a sliding window
|
|
61
|
+
4. When `threshold` crashes occur within `window` ms, the engine:
|
|
62
|
+
- Clears the user's assignment for that experiment
|
|
63
|
+
- Forces the `default` variant on the next resolve
|
|
64
|
+
- Emits an `onRollback` event
|
|
65
|
+
- Optionally persists the rollback so future sessions stay on default
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Configuration
|
|
70
|
+
|
|
71
|
+
Per-experiment, in `experiments.json`:
|
|
72
|
+
|
|
73
|
+
```json
|
|
74
|
+
{
|
|
75
|
+
"id": "news-card-layout",
|
|
76
|
+
"default": "responsive",
|
|
77
|
+
"variants": [
|
|
78
|
+
{ "id": "responsive" },
|
|
79
|
+
{ "id": "new-risky-layout" }
|
|
80
|
+
],
|
|
81
|
+
"rollback": {
|
|
82
|
+
"threshold": 3,
|
|
83
|
+
"window": 60000,
|
|
84
|
+
"persistent": false
|
|
85
|
+
}
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### `threshold`
|
|
90
|
+
|
|
91
|
+
Number of crashes that trigger rollback. Integer, 1-100. Default: 3.
|
|
92
|
+
|
|
93
|
+
- **Lower** = more aggressive (rollback after fewer crashes)
|
|
94
|
+
- **Higher** = more forgiving (tolerates flaky crashes)
|
|
95
|
+
|
|
96
|
+
For high-risk experiments, start at 2. For low-risk, 5-10 is fine.
|
|
97
|
+
|
|
98
|
+
### `window`
|
|
99
|
+
|
|
100
|
+
Time window in milliseconds. Integer, 1000-3600000 (1s to 1h). Default: 60000 (1 min).
|
|
101
|
+
|
|
102
|
+
Crashes outside this window don't count. A variant that crashes once an hour is probably fine; a variant that crashes 3 times in 60 seconds is not.
|
|
103
|
+
|
|
104
|
+
### `persistent`
|
|
105
|
+
|
|
106
|
+
Boolean. Default: `false`.
|
|
107
|
+
|
|
108
|
+
- **`false`**: rollback is in-memory only. Restart the app and the user may get the variant again.
|
|
109
|
+
- **`true`**: rollback is written to Storage. Even after app restart, the user stays on default.
|
|
110
|
+
|
|
111
|
+
Persistent rollbacks are cleared when the engine updates to a new config (presumably with the fix).
|
|
112
|
+
|
|
113
|
+
### Global defaults
|
|
114
|
+
|
|
115
|
+
You can set engine-level defaults:
|
|
116
|
+
|
|
117
|
+
```ts
|
|
118
|
+
createEngine(config, {
|
|
119
|
+
rollbackDefaults: {
|
|
120
|
+
threshold: 3,
|
|
121
|
+
window: 60000,
|
|
122
|
+
persistent: false,
|
|
123
|
+
},
|
|
124
|
+
});
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Per-experiment config overrides the defaults.
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## The VariantErrorBoundary
|
|
132
|
+
|
|
133
|
+
Wrap your variant subtree in an error boundary:
|
|
134
|
+
|
|
135
|
+
```tsx
|
|
136
|
+
import { VariantErrorBoundary, Variant } from "@variantlab/react";
|
|
137
|
+
|
|
138
|
+
<VariantErrorBoundary experimentId="news-card-layout">
|
|
139
|
+
<Variant experimentId="news-card-layout">
|
|
140
|
+
{{
|
|
141
|
+
responsive: <ResponsiveCard />,
|
|
142
|
+
"new-risky-layout": <NewRiskyLayout />,
|
|
143
|
+
}}
|
|
144
|
+
</Variant>
|
|
145
|
+
</VariantErrorBoundary>
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
When the child throws:
|
|
149
|
+
|
|
150
|
+
1. The boundary catches the error
|
|
151
|
+
2. It calls `engine.reportCrash(experimentId, variantId, error)`
|
|
152
|
+
3. It renders a fallback (either the default variant or your custom fallback)
|
|
153
|
+
4. If the rollback threshold is hit, the engine forces the default
|
|
154
|
+
|
|
155
|
+
### Custom fallback
|
|
156
|
+
|
|
157
|
+
```tsx
|
|
158
|
+
<VariantErrorBoundary
|
|
159
|
+
experimentId="news-card-layout"
|
|
160
|
+
fallback={(error, variantId) => (
|
|
161
|
+
<ErrorCard message="Something went wrong" />
|
|
162
|
+
)}
|
|
163
|
+
>
|
|
164
|
+
<Variant experimentId="news-card-layout">...</Variant>
|
|
165
|
+
</VariantErrorBoundary>
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
### Rendering the default on error
|
|
169
|
+
|
|
170
|
+
By default, after a crash, the boundary re-renders with the default variant:
|
|
171
|
+
|
|
172
|
+
```tsx
|
|
173
|
+
<VariantErrorBoundary experimentId="news-card-layout" renderDefaultOnCrash>
|
|
174
|
+
...
|
|
175
|
+
</VariantErrorBoundary>
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
This is the recommended pattern — the user sees the working default, not a generic error.
|
|
179
|
+
|
|
180
|
+
### Nested boundaries
|
|
181
|
+
|
|
182
|
+
You can nest multiple boundaries for different experiments. Each one is scoped to its `experimentId`:
|
|
183
|
+
|
|
184
|
+
```tsx
|
|
185
|
+
<VariantErrorBoundary experimentId="top-nav">
|
|
186
|
+
<TopNavExperiment />
|
|
187
|
+
<VariantErrorBoundary experimentId="card-layout">
|
|
188
|
+
<CardLayoutExperiment />
|
|
189
|
+
</VariantErrorBoundary>
|
|
190
|
+
</VariantErrorBoundary>
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
A crash in `card-layout` won't trigger a rollback on `top-nav`.
|
|
194
|
+
|
|
195
|
+
---
|
|
196
|
+
|
|
197
|
+
## Programmatic reporting
|
|
198
|
+
|
|
199
|
+
If you can't use an error boundary (e.g., async errors, native crashes, worker errors), report directly:
|
|
200
|
+
|
|
201
|
+
```ts
|
|
202
|
+
import { useVariantLabEngine } from "@variantlab/react";
|
|
203
|
+
|
|
204
|
+
const engine = useVariantLabEngine();
|
|
205
|
+
|
|
206
|
+
try {
|
|
207
|
+
await riskyOperation();
|
|
208
|
+
} catch (error) {
|
|
209
|
+
engine.reportCrash("news-card-layout", currentVariant, error);
|
|
210
|
+
throw error;
|
|
211
|
+
}
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### Global error handler
|
|
215
|
+
|
|
216
|
+
```ts
|
|
217
|
+
window.addEventListener("error", (event) => {
|
|
218
|
+
if (event.error?.variantlabExperimentId) {
|
|
219
|
+
engine.reportCrash(
|
|
220
|
+
event.error.variantlabExperimentId,
|
|
221
|
+
event.error.variantlabVariantId,
|
|
222
|
+
event.error,
|
|
223
|
+
);
|
|
224
|
+
}
|
|
225
|
+
});
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
Or on React Native:
|
|
229
|
+
|
|
230
|
+
```ts
|
|
231
|
+
ErrorUtils.setGlobalHandler((error, isFatal) => {
|
|
232
|
+
// inspect the error and call engine.reportCrash if applicable
|
|
233
|
+
});
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
## Persistence
|
|
239
|
+
|
|
240
|
+
When `persistent: true`, rollback state is stored in the engine's Storage adapter under a key like:
|
|
241
|
+
|
|
242
|
+
```
|
|
243
|
+
@variantlab/rollback:news-card-layout:user-123
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
The value includes:
|
|
247
|
+
|
|
248
|
+
- The rolled-back variant ID
|
|
249
|
+
- The timestamp of the rollback
|
|
250
|
+
- The crash count that triggered it
|
|
251
|
+
|
|
252
|
+
### When is it cleared?
|
|
253
|
+
|
|
254
|
+
- **On config update**: if the config version changes (i.e., a new config is loaded), the rollback is cleared. The assumption is that the new config includes the fix.
|
|
255
|
+
- **Manually**: `engine.clearRollback(experimentId)` or via the debug overlay
|
|
256
|
+
- **On `resetAll()`**
|
|
257
|
+
|
|
258
|
+
### Why clear on config update?
|
|
259
|
+
|
|
260
|
+
Because the whole point of a rollback is to protect users until a fix ships. Once the fix is in a new config, the user should be re-enrolled.
|
|
261
|
+
|
|
262
|
+
If you don't want this behavior, set `persistent: "forever"`:
|
|
263
|
+
|
|
264
|
+
```json
|
|
265
|
+
"rollback": { "threshold": 3, "window": 60000, "persistent": "forever" }
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
This keeps the rollback until explicit clearing.
|
|
269
|
+
|
|
270
|
+
---
|
|
271
|
+
|
|
272
|
+
## Events
|
|
273
|
+
|
|
274
|
+
Rollbacks fire an event on the engine's event bus:
|
|
275
|
+
|
|
276
|
+
```ts
|
|
277
|
+
engine.on("rollback", (event) => {
|
|
278
|
+
console.log("Rollback triggered:", event);
|
|
279
|
+
// {
|
|
280
|
+
// experimentId: "news-card-layout",
|
|
281
|
+
// variantId: "new-risky-layout",
|
|
282
|
+
// reason: "threshold-exceeded",
|
|
283
|
+
// crashCount: 3,
|
|
284
|
+
// window: 60000,
|
|
285
|
+
// userId: "user-123",
|
|
286
|
+
// timestamp: 1739500000000,
|
|
287
|
+
// }
|
|
288
|
+
});
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
Use this to:
|
|
292
|
+
|
|
293
|
+
- Forward to your telemetry (e.g., Sentry, Datadog)
|
|
294
|
+
- Alert on-call via PagerDuty
|
|
295
|
+
- Increment a rollback counter in your metrics pipeline
|
|
296
|
+
|
|
297
|
+
### Telemetry integration
|
|
298
|
+
|
|
299
|
+
If you configure a telemetry provider, rollback events are forwarded automatically:
|
|
300
|
+
|
|
301
|
+
```ts
|
|
302
|
+
createEngine(config, {
|
|
303
|
+
telemetry: {
|
|
304
|
+
track(event, properties) {
|
|
305
|
+
posthog.capture(event, properties);
|
|
306
|
+
},
|
|
307
|
+
},
|
|
308
|
+
});
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
---
|
|
312
|
+
|
|
313
|
+
## Debugging a rollback
|
|
314
|
+
|
|
315
|
+
### Debug overlay
|
|
316
|
+
|
|
317
|
+
The overlay's **Events** tab shows rollback events in real time. Each entry has:
|
|
318
|
+
|
|
319
|
+
- Experiment ID
|
|
320
|
+
- Variant that crashed
|
|
321
|
+
- Crash count
|
|
322
|
+
- Timestamp
|
|
323
|
+
- "View stack trace" button
|
|
324
|
+
|
|
325
|
+
### Crash history
|
|
326
|
+
|
|
327
|
+
The overlay's **History** tab shows past rollbacks with timestamps. See [`time-travel.md`](./time-travel.md).
|
|
328
|
+
|
|
329
|
+
### Programmatic inspection
|
|
330
|
+
|
|
331
|
+
```ts
|
|
332
|
+
const rollbacks = engine.getRollbacks();
|
|
333
|
+
// [
|
|
334
|
+
// { experimentId: "news-card-layout", variantId: "new-risky-layout", ... }
|
|
335
|
+
// ]
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
### Manually clearing
|
|
339
|
+
|
|
340
|
+
From the overlay: tap the experiment card → overflow menu → "Clear rollback".
|
|
341
|
+
|
|
342
|
+
Programmatically:
|
|
343
|
+
|
|
344
|
+
```ts
|
|
345
|
+
engine.clearRollback("news-card-layout");
|
|
346
|
+
// or
|
|
347
|
+
engine.resetAll();
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
---
|
|
351
|
+
|
|
352
|
+
## Limitations
|
|
353
|
+
|
|
354
|
+
### What we can detect
|
|
355
|
+
|
|
356
|
+
- **React render errors** — caught by ErrorBoundary
|
|
357
|
+
- **Event handler errors** — caught by ErrorBoundary (React 19+)
|
|
358
|
+
- **Promise rejections inside effects** — caught if you wrap in try/catch and report manually
|
|
359
|
+
- **Global JS errors** — caught if you register a global handler and report manually
|
|
360
|
+
|
|
361
|
+
### What we cannot detect
|
|
362
|
+
|
|
363
|
+
- **Native crashes** (iOS/Android native layer) — these bypass JavaScript entirely
|
|
364
|
+
- **Memory leaks** that don't throw — not an error, just slow
|
|
365
|
+
- **Infinite loops** — the app hangs, no error is thrown
|
|
366
|
+
- **Crashes in the engine itself** — if variantlab's code crashes, the rollback system can't run
|
|
367
|
+
|
|
368
|
+
For the first and last cases, pair variantlab with Sentry or Crashlytics to get native crash reports and manually update your config when you see a surge.
|
|
369
|
+
|
|
370
|
+
### False positives
|
|
371
|
+
|
|
372
|
+
A variant that legitimately errors once in a while (e.g., transient network failures) may trigger a rollback even though the variant is fine. Mitigation:
|
|
373
|
+
|
|
374
|
+
- Use a higher `threshold`
|
|
375
|
+
- Use a longer `window`
|
|
376
|
+
- Don't report network errors as variant crashes
|
|
377
|
+
|
|
378
|
+
### Race conditions
|
|
379
|
+
|
|
380
|
+
If the same user has the same experiment rendered in 3 places simultaneously and all 3 crash, that's 3 crash reports in milliseconds. The threshold counts real reports, so 3 concurrent reports count as 3 crashes. This is usually desired — 3 crashes in a frame is worse than 3 crashes over a minute.
|
|
381
|
+
|
|
382
|
+
---
|
|
383
|
+
|
|
384
|
+
## Best practices
|
|
385
|
+
|
|
386
|
+
1. **Wrap risky experiments** — always use `VariantErrorBoundary` around experimental components
|
|
387
|
+
2. **Tune thresholds** — higher threshold for low-risk experiments, lower for high-risk
|
|
388
|
+
3. **Always set a working default** — the whole point of rollback is to fall back to the default
|
|
389
|
+
4. **Forward to telemetry** — don't just rollback silently; alert the team
|
|
390
|
+
5. **Test the rollback** — in dev, inject a crash and verify the rollback triggers
|
|
391
|
+
6. **Clear rollbacks on fix** — ship a new config version and the rollback auto-clears
|
|
392
|
+
|
|
393
|
+
---
|
|
394
|
+
|
|
395
|
+
## See also
|
|
396
|
+
|
|
397
|
+
- [`API.md`](../../API.md) — `VariantErrorBoundary` props
|
|
398
|
+
- [`origin-story.md`](../research/origin-story.md) — the crashing card that inspired this feature
|
|
399
|
+
- [`hmac-signing.md`](./hmac-signing.md) — making sure rollback configs aren't tampered with
|