bulletin-deploy 0.5.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,453 @@
1
+ # Error Telemetry Implementation Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Make deploy failures visible in Sentry Issues with full context, enabling alerting and dashboard-driven observability.
6
+
7
+ **Architecture:** Enrich existing `withSpan`/`withDeploySpan` wrappers in telemetry.js with `captureException` + error attributes inside the `startSpan` callback. Add a `captureWarning` helper for transient failures at retry/fallback call sites. Close a span coverage gap in the DotNS registration flow.
8
+
9
+ **Tech Stack:** `@sentry/node` ^9.14.0, Node.js `node:test` runner
10
+
11
+ **Spec:** `docs/superpowers/specs/2026-03-24-error-telemetry-design.md`
12
+
13
+ ---
14
+
15
+ ## File Structure
16
+
17
+ | File | Role | Change |
18
+ |------|------|--------|
19
+ | `src/telemetry.js` | Telemetry wrappers | Modify `withSpan`, `withDeploySpan`; add `captureWarning` |
20
+ | `src/deploy.js` | Deploy flow / chunk uploads | Add `captureWarning` import and calls in chunk retry loop |
21
+ | `src/dotns.js` | DotNS registration | Add `captureWarning` import and call in RPC fallback; wrap `getPriceAndValidate` in span; renumber finalize span |
22
+ | `test/test.js` | Unit tests | Add tests for `captureWarning`, `withSpan` error propagation, `withDeploySpan` error propagation |
23
+
24
+ ---
25
+
26
+ ### Task 1: Add `captureWarning` helper
27
+
28
+ **Files:**
29
+ - Modify: `src/telemetry.js:70-73`
30
+ - Test: `test/test.js`
31
+
32
+ - [ ] **Step 1: Write the failing test**
33
+
34
+ Add to `test/test.js` — import `captureWarning` and test the no-op path (Sentry is null in test environment because `BULLETIN_DEPLOY_TELEMETRY=0` or `@sentry/node` isn't initialized):
35
+
36
+ ```js
37
+ import { captureWarning } from "../src/telemetry.js";
38
+
39
+ // ---------------------------------------------------------------------------
40
+ // 6. captureWarning
41
+ // ---------------------------------------------------------------------------
42
+ describe("captureWarning", () => {
43
+ test("does not throw when Sentry is disabled", () => {
44
+ assert.doesNotThrow(() => captureWarning("test warning", { key: "value" }));
45
+ });
46
+ });
47
+ ```
48
+
49
+ - [ ] **Step 2: Run test to verify it fails**
50
+
51
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
52
+ Expected: FAIL — `captureWarning` is not exported from telemetry.js
53
+
54
+ - [ ] **Step 3: Implement `captureWarning`**
55
+
56
+ Add before the `flush` export at the bottom of `src/telemetry.js` (before the `flush` function):
57
+
58
+ ```js
59
+ export function captureWarning(message, context) {
60
+ if (!Sentry) return;
61
+ try {
62
+ Sentry.addBreadcrumb({ level: "warning", message, data: context });
63
+ Sentry.captureMessage(message, { level: "warning", extra: context });
64
+ } catch { /* telemetry must never break the deploy */ }
65
+ }
66
+ ```
67
+
68
+ - [ ] **Step 4: Run test to verify it passes**
69
+
70
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
71
+ Expected: ALL PASS
72
+
73
+ - [ ] **Step 5: Commit**
74
+
75
+ ```bash
76
+ git add src/telemetry.js test/test.js
77
+ git commit -m "feat(telemetry): add captureWarning helper for transient failures"
78
+ ```
79
+
80
+ ---
81
+
82
+ ### Task 2: Add error capture to `withSpan`
83
+
84
+ **Files:**
85
+ - Modify: `src/telemetry.js` — the `withSpan` function
86
+ - Test: `test/test.js`
87
+
88
+ **Note:** Line numbers in this task refer to the original file. Tasks 1-3 all modify `src/telemetry.js` sequentially, so use function names as anchors rather than line numbers.
89
+
90
+ - [ ] **Step 1: Write the failing test**
91
+
92
+ Add to `test/test.js` — import `withSpan` and verify errors propagate correctly:
93
+
94
+ ```js
95
+ import { withSpan } from "../src/telemetry.js";
96
+
97
+ // ---------------------------------------------------------------------------
98
+ // 7. withSpan error propagation
99
+ // ---------------------------------------------------------------------------
100
+ describe("withSpan", () => {
101
+ test("propagates errors from the callback", async () => {
102
+ await assert.rejects(
103
+ () => withSpan("test.op", "test span", {}, () => { throw new Error("span error"); }),
104
+ { message: "span error" }
105
+ );
106
+ });
107
+
108
+ test("returns the callback result on success", async () => {
109
+ const result = await withSpan("test.op", "test span", {}, () => "ok");
110
+ assert.strictEqual(result, "ok");
111
+ });
112
+ });
113
+ ```
114
+
115
+ - [ ] **Step 2: Run test to verify it passes (baseline)**
116
+
117
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
118
+ Expected: PASS — current `withSpan` already propagates errors in the no-Sentry path. This test locks in existing behavior before we change the implementation.
119
+
120
+ - [ ] **Step 3: Modify `withSpan` to capture errors**
121
+
122
+ Replace the `withSpan` function in `src/telemetry.js`. This is the **only** place that calls `captureException` for phase errors — `withDeploySpan` (Task 3) will NOT call `captureException` to avoid double-capture. The `deploy.phase` tag is set here so error events carry the correct phase for dashboard queries.
123
+
124
+ ```js
125
+ export async function withSpan(op, description, attributes, fn) {
126
+ if (!Sentry) return fn();
127
+ return Sentry.startSpan({ op, name: description, attributes }, async (span) => {
128
+ try {
129
+ return await fn();
130
+ } catch (error) {
131
+ span.setAttribute("error.message", error.message);
132
+ Sentry.setTag("deploy.phase", op);
133
+ Sentry.captureException(error);
134
+ error._sentryCaptured = true;
135
+ throw error;
136
+ }
137
+ });
138
+ }
139
+ ```
140
+
141
+ The `error._sentryCaptured = true` flag prevents `withDeploySpan` from capturing the same error a second time.
142
+
143
+ - [ ] **Step 4: Run test to verify it still passes**
144
+
145
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
146
+ Expected: ALL PASS — behavior unchanged in the no-Sentry path; the new try/catch only activates when Sentry is loaded.
147
+
148
+ - [ ] **Step 5: Commit**
149
+
150
+ ```bash
151
+ git add src/telemetry.js test/test.js
152
+ git commit -m "feat(telemetry): capture exceptions and error attributes in withSpan"
153
+ ```
154
+
155
+ ---
156
+
157
+ ### Task 3: Add error capture and deploy tags to `withDeploySpan`
158
+
159
+ **Files:**
160
+ - Modify: `src/telemetry.js` — the `withDeploySpan` function
161
+ - Test: `test/test.js`
162
+
163
+ - [ ] **Step 1: Write the failing test**
164
+
165
+ Add to `test/test.js` — import `withDeploySpan`:
166
+
167
+ ```js
168
+ import { withDeploySpan } from "../src/telemetry.js";
169
+
170
+ // ---------------------------------------------------------------------------
171
+ // 8. withDeploySpan error propagation
172
+ // ---------------------------------------------------------------------------
173
+ describe("withDeploySpan", () => {
174
+ test("propagates errors from the callback", async () => {
175
+ await assert.rejects(
176
+ () => withDeploySpan("test-domain", () => { throw new Error("deploy error"); }),
177
+ { message: "deploy error" }
178
+ );
179
+ });
180
+
181
+ test("returns the callback result on success", async () => {
182
+ const result = await withDeploySpan("test-domain", () => "deployed");
183
+ assert.strictEqual(result, "deployed");
184
+ });
185
+ });
186
+ ```
187
+
188
+ - [ ] **Step 2: Run test to verify it passes (baseline)**
189
+
190
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
191
+ Expected: PASS — locks in existing behavior.
192
+
193
+ - [ ] **Step 3: Modify `withDeploySpan` to capture errors and set tags**
194
+
195
+ Replace the `withDeploySpan` function in `src/telemetry.js`:
196
+
197
+ ```js
198
+ export async function withDeploySpan(domain, fn) {
199
+ if (!Sentry) return fn();
200
+ const attrs = { ...getDeployAttributes(), "deploy.domain": domain };
201
+ try {
202
+ return await Sentry.startSpan({ op: "deploy", name: `deploy ${domain}`, attributes: attrs }, async (span) => {
203
+ Sentry.setTags({
204
+ "deploy.repo": attrs["deploy.repo"],
205
+ "deploy.branch": attrs["deploy.branch"],
206
+ "deploy.domain": domain,
207
+ "deploy.source": attrs["deploy.source"],
208
+ });
209
+ try {
210
+ return await fn();
211
+ } catch (error) {
212
+ span.setAttribute("deploy.status", "error");
213
+ span.setAttribute("deploy.error", error.message);
214
+ if (!error._sentryCaptured) {
215
+ Sentry.setTag("deploy.phase", "deploy");
216
+ Sentry.captureException(error);
217
+ }
218
+ throw error;
219
+ }
220
+ });
221
+ } finally {
222
+ await Sentry.flush(5000);
223
+ }
224
+ }
225
+ ```
226
+
227
+ Key points:
228
+ - `Sentry.setTags` at span start ensures all error events within the deploy carry `deploy.repo`, `deploy.branch`, `deploy.domain`
229
+ - **No double capture**: The `_sentryCaptured` check ensures errors already captured by an inner `withSpan` are not captured again. Only errors that bypass inner spans (e.g. code between spans) get captured here with `deploy.phase = "deploy"`.
230
+ - The span still gets `deploy.status` and `deploy.error` attributes regardless — useful for trace view.
231
+ - The outer `try/finally` with `flush` is unchanged.
232
+
233
+ - [ ] **Step 4: Run test to verify it still passes**
234
+
235
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
236
+ Expected: ALL PASS
237
+
238
+ - [ ] **Step 5: Commit**
239
+
240
+ ```bash
241
+ git add src/telemetry.js test/test.js
242
+ git commit -m "feat(telemetry): capture exceptions and deploy tags in withDeploySpan"
243
+ ```
244
+
245
+ ---
246
+
247
+ ### Task 4: Add chunk retry warnings in deploy.js
248
+
249
+ **Files:**
250
+ - Modify: `src/deploy.js:20` (import)
251
+ - Modify: `src/deploy.js:201-214` (chunk retry loop)
252
+
253
+ - [ ] **Step 1: Add `captureWarning` to the import**
254
+
255
+ In `src/deploy.js` line 20, change:
256
+
257
+ ```js
258
+ import { initTelemetry, withSpan, withDeploySpan, setDeployAttribute, flush } from "./telemetry.js";
259
+ ```
260
+
261
+ to:
262
+
263
+ ```js
264
+ import { initTelemetry, withSpan, withDeploySpan, setDeployAttribute, captureWarning, flush } from "./telemetry.js";
265
+ ```
266
+
267
+ - [ ] **Step 2: Add warning at initial batch failure detection**
268
+
269
+ In `src/deploy.js`, inside the `for (const fail of failures)` loop body (after line 201), add a `captureWarning` call before the retry loop:
270
+
271
+ ```js
272
+ for (const fail of failures) {
273
+ captureWarning("Chunk upload failed, retrying", { chunkIndex: fail.index + 1, maxRetries: MAX_CHUNK_RETRIES, error: fail.error?.message?.slice(0, 200) });
274
+ let retried = false;
275
+ ```
276
+
277
+ - [ ] **Step 3: Add warning on individual retry failures**
278
+
279
+ In `src/deploy.js`, inside the retry catch block (after line 213), add:
280
+
281
+ ```js
282
+ } catch (e) {
283
+ captureWarning("Chunk retry failed", { chunkIndex: fail.index + 1, attempt, maxRetries: MAX_CHUNK_RETRIES, error: e.message?.slice(0, 200) });
284
+ console.log(` Retry ${attempt} failed: ${e.message?.slice(0, 80)}`);
285
+ }
286
+ ```
287
+
288
+ - [ ] **Step 4: Run existing tests to verify no regressions**
289
+
290
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
291
+ Expected: ALL PASS
292
+
293
+ - [ ] **Step 5: Commit**
294
+
295
+ ```bash
296
+ git add src/deploy.js
297
+ git commit -m "feat(telemetry): add transient failure warnings for chunk retries"
298
+ ```
299
+
300
+ ---
301
+
302
+ ### Task 5: Add RPC fallback warning in dotns.js
303
+
304
+ **Files:**
305
+ - Modify: `src/dotns.js:23` (import)
306
+ - Modify: `src/dotns.js:319-323` (RPC fallback catch block)
307
+
308
+ - [ ] **Step 1: Add `captureWarning` to the import**
309
+
310
+ In `src/dotns.js` line 23, change:
311
+
312
+ ```js
313
+ import { withSpan } from "./telemetry.js";
314
+ ```
315
+
316
+ to:
317
+
318
+ ```js
319
+ import { withSpan, captureWarning } from "./telemetry.js";
320
+ ```
321
+
322
+ - [ ] **Step 2: Add warning in the RPC endpoint catch block**
323
+
324
+ In `src/dotns.js`, inside the `connect()` catch block (after line 319), add the `captureWarning` call:
325
+
326
+ ```js
327
+ } catch (e) {
328
+ lastError = e;
329
+ captureWarning("DotNS RPC endpoint failed, trying next", { endpoint: rpc, error: e.message, remainingEndpoints: endpoints.length - endpoints.indexOf(rpc) - 1 });
330
+ console.log(` Failed to connect to ${rpc}: ${e.message}`);
331
+ if (this.client) { try { this.client.destroy(); } catch {} this.client = null; }
332
+ }
333
+ ```
334
+
335
+ - [ ] **Step 3: Run existing tests to verify no regressions**
336
+
337
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
338
+ Expected: ALL PASS
339
+
340
+ - [ ] **Step 4: Commit**
341
+
342
+ ```bash
343
+ git add src/dotns.js
344
+ git commit -m "feat(telemetry): add transient failure warning for DotNS RPC fallback"
345
+ ```
346
+
347
+ ---
348
+
349
+ ### Task 6: Close span coverage gap and renumber spans in dotns.js
350
+
351
+ **Files:**
352
+ - Modify: `src/dotns.js:528-529` (register method)
353
+
354
+ - [ ] **Step 1: Wrap `getPriceAndValidate` in a span and renumber `finalize-registration`**
355
+
356
+ In `src/dotns.js`, replace lines 528-529:
357
+
358
+ ```js
359
+ const pricing = await this.getPriceAndValidate(label);
360
+ await withSpan("deploy.dotns.finalize-registration", "2a-iii. finalize-registration", {}, () => this.finalizeRegistration(registration, pricing.priceWei));
361
+ ```
362
+
363
+ with:
364
+
365
+ ```js
366
+ const pricing = await withSpan("deploy.dotns.price-validation", "2a-iii. price-validation", {}, () => this.getPriceAndValidate(label));
367
+ await withSpan("deploy.dotns.finalize-registration", "2a-iv. finalize-registration", {}, () => this.finalizeRegistration(registration, pricing.priceWei));
368
+ ```
369
+
370
+ - [ ] **Step 2: Run existing tests to verify no regressions**
371
+
372
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
373
+ Expected: ALL PASS
374
+
375
+ - [ ] **Step 3: Commit**
376
+
377
+ ```bash
378
+ git add src/dotns.js
379
+ git commit -m "feat(telemetry): add price-validation span, renumber finalize to 2a-iv"
380
+ ```
381
+
382
+ ---
383
+
384
+ ### Task 7: Final verification
385
+
386
+ - [ ] **Step 1: Run full test suite**
387
+
388
+ Run: `BULLETIN_DEPLOY_TELEMETRY=0 node --test test/test.js`
389
+ Expected: ALL PASS
390
+
391
+ - [ ] **Step 2: Verify telemetry.js exports are consistent**
392
+
393
+ Run: `node -e "import('./src/telemetry.js').then(m => console.log(Object.keys(m).sort().join(', ')))"`
394
+ Expected: `captureWarning, flush, initTelemetry, setDeployAttribute, withDeploySpan, withSpan`
395
+
396
+ - [ ] **Step 3: Commit all remaining changes (if any)**
397
+
398
+ **Testing limitation note:** All tests run with Sentry disabled (the no-op path). The Sentry-active code paths (captureException, captureMessage, addBreadcrumb, setTags, setAttribute) are only exercised during real deploys. This is acceptable because the no-op guard (`if (!Sentry) return`) means tests verify that non-telemetry behavior is preserved, and the Sentry calls are straightforward SDK usage. Mock-based Sentry tests can be added as a follow-up if needed.
399
+
400
+ ---
401
+
402
+ ### Task 8: Dashboard widgets (manual)
403
+
404
+ This task is performed in the Sentry web UI, not in code.
405
+
406
+ **Dashboard:** `https://polkadot-community-foundation.sentry.io/dashboards/92523/`
407
+
408
+ - [ ] **Step 1: Add "Failure rate over time" widget**
409
+
410
+ Type: Line chart
411
+ Dataset: Errors
412
+ Query: `event.type:error`
413
+ Y-axis: `count()`
414
+ Group by: timestamp (hour)
415
+
416
+ - [ ] **Step 2: Add "Failures by repo" widget**
417
+
418
+ Type: Bar chart
419
+ Dataset: Errors
420
+ Query: `event.type:error`
421
+ Y-axis: `count()`
422
+ Group by: `deploy.repo` tag
423
+
424
+ - [ ] **Step 3: Add "Failures by phase" widget**
425
+
426
+ Type: Bar chart
427
+ Dataset: Errors
428
+ Query: `event.type:error`
429
+ Y-axis: `count()`
430
+ Group by: `deploy.phase` tag
431
+
432
+ - [ ] **Step 4: Add "Failures by error type" widget**
433
+
434
+ Type: Table
435
+ Dataset: Errors
436
+ Query: `event.type:error`
437
+ Columns: `deploy.error`, `count()`, `last_seen()`
438
+
439
+ - [ ] **Step 5: Add "Transient failure trends" widget**
440
+
441
+ Type: Line chart
442
+ Dataset: Errors
443
+ Query: `level:warning`
444
+ Y-axis: `count()`
445
+ Group by: timestamp (day)
446
+
447
+ - [ ] **Step 6: Add "Deploy success vs failure" widget**
448
+
449
+ Type: Stacked area chart
450
+ Dataset: Spans
451
+ Query: `span.op:deploy`
452
+ Y-axis: `count()`
453
+ Group by: `span.status`
@@ -0,0 +1,105 @@
1
+ # Error Telemetry for bulletin-deploy
2
+
3
+ **Date**: 2026-03-24
4
+ **Status**: Approved
5
+ **Trigger**: t3rminal deploy failed 6 times on 2026-03-23 with zero visibility in Sentry
6
+
7
+ ## Problem
8
+
9
+ The telemetry module creates spans/traces for deploy phases, but:
10
+
11
+ 1. **No error events** — `Sentry.captureException()` is never called. Failures produce incomplete span trees but no Sentry Issues, so alerting is impossible.
12
+ 2. **No error context on spans** — When a span's callback throws, the span ends but carries no attributes describing what failed or why.
13
+ 3. **Span coverage gap** — `getPriceAndValidate` in `dotns.js` runs between the `wait-commitment-age` and `finalize-registration` spans with no span of its own. Last night's failure (DotNS v1.1.0 rejecting names with trailing hyphens) hit this exact gap.
14
+ 4. **Transient failures invisible** — Chunk retries, RPC fallbacks, and contract reverts that self-recover leave no trace. Flakiness trends are undetectable until they escalate to full failures.
15
+
16
+ ## Design
17
+
18
+ ### 1. Terminal error capture (telemetry.js)
19
+
20
+ Modify `withSpan` and `withDeploySpan` to catch errors before re-throwing.
21
+
22
+ **Critical implementation detail**: The try/catch + `captureException` must go **inside** the `Sentry.startSpan` callback, not outside it. The deploy span is only the active span within the callback scope — calling `captureException` outside would not associate the error event with the correct span/trace.
23
+
24
+ **`withSpan(op, description, attributes, fn)`**:
25
+ - Change the startSpan callback to intercept the `span` argument: `Sentry.startSpan({ ... }, async (span) => { ... })`
26
+ - Inside the callback: try/catch around `fn()`
27
+ - On error: `span.setAttribute("error.message", error.message)`, call `Sentry.captureException(error)`, re-throw
28
+ - Sentry auto-sets span status to ERROR on throw (keep this behavior)
29
+
30
+ **`withDeploySpan(domain, fn)`**:
31
+ - Same pattern: try/catch goes **inside** the startSpan callback to access the span and active scope
32
+ - On error: `span.setAttribute("deploy.status", "error")`, `span.setAttribute("deploy.error", error.message)`, call `Sentry.captureException(error)`, re-throw
33
+ - Before `captureException`, call `Sentry.setTag("deploy.phase", span.op)` so the error event carries the phase for dashboard queries (span attributes don't auto-propagate to error events)
34
+ - Set `Sentry.setTags` for `deploy.repo`, `deploy.branch`, `deploy.domain` in the scope at span start so all error events within the deploy carry these tags
35
+ - The outer `try/finally` stays outside the startSpan callback for `Sentry.flush()`
36
+
37
+ ### 2. Transient failure capture
38
+
39
+ New helper in telemetry.js:
40
+
41
+ **`captureWarning(message, context)`**:
42
+ - Calls `Sentry.addBreadcrumb({ level: "warning", message, data: context })` — attaches to the timeline of any subsequent error event
43
+ - Calls `Sentry.captureMessage(message, { level: "warning", extra: context })` — creates a standalone Sentry Issue for the transient failure
44
+
45
+ **Volume note**: Every transient failure creates a standalone Sentry Issue. Sentry deduplicates by message fingerprint, so repeated chunk retries with the same message collapse into one issue with multiple events. If volume becomes a concern, `captureMessage` can be removed and breadcrumbs alone retained (they still appear in the timeline of any terminal error).
46
+
47
+ Call sites — **only at points where a failure is caught and retried**, not in generic methods:
48
+
49
+ **deploy.js — chunk retry loop** (~line 201):
50
+ - At the start of `for (const fail of failures)` loop body, when a chunk from the batch has failed and will be retried
51
+ - Also inside the retry loop catch (~line 212) for subsequent retry failures
52
+ - Context: `{ chunkIndex, attempt, maxRetries, error: error.message }`
53
+
54
+ **dotns.js — RPC endpoint fallback** (~line 319-323):
55
+ - When an RPC endpoint fails in `connect()` and the next endpoint will be tried
56
+ - Context: `{ endpoint, error: error.message, remainingEndpoints }`
57
+
58
+ **NOT in contractCall/contractTransaction generically** — those methods throw for both terminal and transient failures. Adding captureWarning there would duplicate errors already captured by `withSpan`. Only the specific retry/fallback sites above get captureWarning calls.
59
+
60
+ ### 3. Span coverage gap (dotns.js)
61
+
62
+ Wrap `getPriceAndValidate` in its own span in the `register` method:
63
+
64
+ ```
65
+ 2a-i. submit-commitment (deploy.dotns.submit-commitment)
66
+ 2a-ii. wait-commitment-age (deploy.dotns.wait-commitment-age)
67
+ 2a-iii. price-validation (deploy.dotns.price-validation) <-- NEW
68
+ 2a-iv. finalize-registration (deploy.dotns.finalize-registration) <-- renumbered
69
+ ```
70
+
71
+ This closes the gap where last night's failure was invisible in the trace tree.
72
+
73
+ ### 4. Dashboard widgets
74
+
75
+ Add to existing dashboard at `polkadot-community-foundation.sentry.io/dashboards/92523/`:
76
+
77
+ | Widget | Type | Groups by | Purpose |
78
+ |--------|------|-----------|---------|
79
+ | Failure rate over time | Line chart | Error events by day/hour | Trend and cluster detection |
80
+ | Failures by repo | Bar chart | `deploy.repo` tag | Which consuming repos hit failures most |
81
+ | Failures by phase | Bar chart | `deploy.phase` tag on error events | Storage vs DotNS vs specific sub-phase |
82
+ | Failures by error type | Table | `deploy.error` message | "Contract reverted" vs "timed out" vs "RPC failed" |
83
+ | Transient failure trends | Line chart | Warning events over time | Flakiness early warning |
84
+ | Deploy success vs failure | Stacked area | Span status (ok/error) over time | At-a-glance health |
85
+
86
+ ## Files changed
87
+
88
+ | File | Change |
89
+ |------|--------|
90
+ | `src/telemetry.js` | Modify `withSpan`, `withDeploySpan`; add `captureWarning` |
91
+ | `src/deploy.js` | Add `captureWarning` calls in chunk retry loop |
92
+ | `src/dotns.js` | Add `captureWarning` calls in RPC fallback and contract errors; wrap `getPriceAndValidate` in span |
93
+
94
+ ## What this enables
95
+
96
+ - **Alerting**: Alert rules on error events (any deploy failure), warning events (transient failure trends), specific error messages (contract behavior changes)
97
+ - **Dashboards**: Failure rates, phase breakdown, repo breakdown, flakiness trends
98
+ - **Trace diagnostics**: Error attributes on spans show what failed and why; breadcrumb timeline shows the full sequence of events leading to failure
99
+ - **Coverage**: Both CI and local CLI paths go through `deploy()` -> `initTelemetry()` -> `withDeploySpan()`, so all changes apply to both
100
+
101
+ ## Out of scope
102
+
103
+ - Custom Sentry metrics (span durations already cover performance)
104
+ - Structured error types/codes (error messages are sufficient for now)
105
+ - Retry logic changes (separate concern from observability)
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "bulletin-deploy",
3
- "version": "0.5.0",
3
+ "version": "0.5.2",
4
4
  "private": false,
5
5
  "repository": {
6
6
  "type": "git",
package/src/deploy.js CHANGED
@@ -17,7 +17,7 @@ import { UnixFS } from "ipfs-unixfs";
17
17
  import { DotNS, fetchNonce, TX_TIMEOUT_MS } from "./dotns.js";
18
18
  import { cryptoWaitReady } from "@polkadot/util-crypto";
19
19
  import { derivePoolAccounts, fetchPoolAuthorizations, selectAccount, ensureAuthorized } from "./pool.js";
20
- import { initTelemetry, withSpan, withDeploySpan, setDeployAttribute, flush } from "./telemetry.js";
20
+ import { initTelemetry, withSpan, withDeploySpan, setDeployAttribute, captureWarning, flush } from "./telemetry.js";
21
21
 
22
22
  const BULLETIN_RPC = process.env.BULLETIN_RPC || "wss://paseo-bulletin-rpc.polkadot.io";
23
23
  const POOL_SIZE = parseInt(process.env.BULLETIN_POOL_SIZE || "10", 10);
@@ -199,6 +199,7 @@ export async function storeChunkedContent(chunks, { client: existingClient, unsa
199
199
  .filter(Boolean);
200
200
 
201
201
  for (const fail of failures) {
202
+ captureWarning("Chunk upload failed, retrying", { chunkIndex: fail.index + 1, maxRetries: MAX_CHUNK_RETRIES, error: fail.error?.message?.slice(0, 200) });
202
203
  let retried = false;
203
204
  for (let attempt = 1; attempt <= MAX_CHUNK_RETRIES; attempt++) {
204
205
  console.log(` Retrying chunk ${fail.index + 1} (attempt ${attempt}/${MAX_CHUNK_RETRIES})...`);
@@ -210,6 +211,7 @@ export async function storeChunkedContent(chunks, { client: existingClient, unsa
210
211
  retried = true;
211
212
  break;
212
213
  } catch (e) {
214
+ captureWarning("Chunk retry failed", { chunkIndex: fail.index + 1, attempt, maxRetries: MAX_CHUNK_RETRIES, error: e.message?.slice(0, 200) });
213
215
  console.log(` Retry ${attempt} failed: ${e.message?.slice(0, 80)}`);
214
216
  }
215
217
  }
package/src/dotns.js CHANGED
@@ -20,7 +20,7 @@ import {
20
20
  concatHex,
21
21
  } from "viem";
22
22
  import { CID } from "multiformats/cid";
23
- import { withSpan } from "./telemetry.js";
23
+ import { withSpan, captureWarning } from "./telemetry.js";
24
24
 
25
25
  export const RPC_ENDPOINTS = [
26
26
  "wss://asset-hub-paseo.dotters.network",
@@ -160,7 +160,7 @@ export function computeDomainTokenId(label) {
160
160
  return BigInt(node);
161
161
  }
162
162
  export function countTrailingDigits(label) { let count = 0; for (let i = label.length - 1; i >= 0; i--) { const code = label.charCodeAt(i); if (code >= 48 && code <= 57) count++; else break; } return count; }
163
- export function stripTrailingDigits(label) { return label.replace(/\d+$/, ""); }
163
+ export function stripTrailingDigits(label) { return label.replace(/\d+$/, "").replace(/-$/, ""); }
164
164
 
165
165
  export function validateDomainLabel(label) {
166
166
  if (!/^[a-z0-9-]{3,}$/.test(label)) throw new Error("Invalid domain label: must contain only lowercase letters, digits, and hyphens, min 3 chars");
@@ -318,6 +318,7 @@ export class DotNS {
318
318
  return this;
319
319
  } catch (e) {
320
320
  lastError = e;
321
+ captureWarning("DotNS RPC endpoint failed, trying next", { endpoint: rpc, error: e.message, remainingEndpoints: endpoints.length - endpoints.indexOf(rpc) - 1 });
321
322
  console.log(` Failed to connect to ${rpc}: ${e.message}`);
322
323
  if (this.client) { try { this.client.destroy(); } catch {} this.client = null; }
323
324
  }
@@ -525,8 +526,8 @@ export class DotNS {
525
526
  const { commitment, registration } = await this.generateCommitment(label, reverse);
526
527
  await withSpan("deploy.dotns.submit-commitment", "2a-i. submit-commitment", {}, () => this.submitCommitment(commitment));
527
528
  await withSpan("deploy.dotns.wait-commitment-age", "2a-ii. wait-commitment-age", {}, () => this.waitForCommitmentAge());
528
- const pricing = await this.getPriceAndValidate(label);
529
- await withSpan("deploy.dotns.finalize-registration", "2a-iii. finalize-registration", {}, () => this.finalizeRegistration(registration, pricing.priceWei));
529
+ const pricing = await withSpan("deploy.dotns.price-validation", "2a-iii. price-validation", {}, () => this.getPriceAndValidate(label));
530
+ await withSpan("deploy.dotns.finalize-registration", "2a-iv. finalize-registration", {}, () => this.finalizeRegistration(registration, pricing.priceWei));
530
531
  await this.verifyOwnership(label);
531
532
  console.log(`\n Registration complete!`);
532
533
  return { label, owner: this.evmAddress };
package/src/telemetry.js CHANGED
@@ -2,6 +2,8 @@
2
2
  // Set BULLETIN_DEPLOY_TELEMETRY=0 to disable.
3
3
 
4
4
  import { execSync } from "node:child_process";
5
+ import * as fs from "node:fs";
6
+ import * as path from "node:path";
5
7
 
6
8
  const DEFAULT_DSN = "https://e021c025d79c4c3ade2862a11f13c40b@o4509440811401216.ingest.de.sentry.io/4511093597405264";
7
9
  const DISABLED = process.env.BULLETIN_DEPLOY_TELEMETRY === "0";
@@ -25,10 +27,27 @@ export function initTelemetry() {
25
27
  });
26
28
  }
27
29
 
30
+ function extractRepoSlug(url) {
31
+ return url.replace(/.*github\.com[:/]/, "").replace(/\.git$/, "");
32
+ }
33
+
28
34
  function tryGitRemote() {
29
35
  try {
30
- return execSync("git remote get-url origin", { encoding: "utf-8" }).trim().replace(/.*github\.com[:/]/, "").replace(/\.git$/, "");
31
- } catch { return "unknown"; }
36
+ return extractRepoSlug(execSync("git remote get-url origin", { encoding: "utf-8" }).trim());
37
+ } catch { return undefined; }
38
+ }
39
+
40
+ function tryPackageJsonRepo() {
41
+ try {
42
+ const pkg = JSON.parse(fs.readFileSync(path.join(process.cwd(), "package.json"), "utf-8"));
43
+ const repo = typeof pkg.repository === "string" ? pkg.repository : pkg.repository?.url;
44
+ if (repo) return extractRepoSlug(repo);
45
+ } catch {}
46
+ return undefined;
47
+ }
48
+
49
+ export function resolveRepo(domain) {
50
+ return process.env.GITHUB_REPOSITORY || tryGitRemote() || tryPackageJsonRepo() || domain || "unknown";
32
51
  }
33
52
 
34
53
  function tryGitBranch() {
@@ -37,9 +56,9 @@ function tryGitBranch() {
37
56
  } catch { return "unknown"; }
38
57
  }
39
58
 
40
- function getDeployAttributes() {
59
+ function getDeployAttributes(domain) {
41
60
  return {
42
- "deploy.repo": process.env.GITHUB_REPOSITORY || tryGitRemote(),
61
+ "deploy.repo": resolveRepo(domain),
43
62
  "deploy.branch": process.env.GITHUB_HEAD_REF || process.env.GITHUB_REF_NAME || tryGitBranch(),
44
63
  "deploy.source": process.env.CI ? "ci" : "local",
45
64
  "deploy.pr": process.env.GITHUB_PR_NUMBER || undefined,
@@ -48,14 +67,42 @@ function getDeployAttributes() {
48
67
 
49
68
  export async function withSpan(op, description, attributes, fn) {
50
69
  if (!Sentry) return fn();
51
- return Sentry.startSpan({ op, name: description, attributes }, fn);
70
+ return Sentry.startSpan({ op, name: description, attributes }, async (span) => {
71
+ try {
72
+ return await fn();
73
+ } catch (error) {
74
+ span.setAttribute("error.message", error.message);
75
+ Sentry.setTag("deploy.phase", op);
76
+ Sentry.captureException(error);
77
+ error._sentryCaptured = true;
78
+ throw error;
79
+ }
80
+ });
52
81
  }
53
82
 
54
83
  export async function withDeploySpan(domain, fn) {
55
84
  if (!Sentry) return fn();
56
- const attrs = { ...getDeployAttributes(), "deploy.domain": domain };
85
+ const attrs = { ...getDeployAttributes(domain), "deploy.domain": domain };
57
86
  try {
58
- return await Sentry.startSpan({ op: "deploy", name: `deploy ${domain}`, attributes: attrs }, fn);
87
+ return await Sentry.startSpan({ op: "deploy", name: `deploy ${domain}`, attributes: attrs }, async (span) => {
88
+ Sentry.setTags({
89
+ "deploy.repo": attrs["deploy.repo"],
90
+ "deploy.branch": attrs["deploy.branch"],
91
+ "deploy.domain": domain,
92
+ "deploy.source": attrs["deploy.source"],
93
+ });
94
+ try {
95
+ return await fn();
96
+ } catch (error) {
97
+ span.setAttribute("deploy.status", "error");
98
+ span.setAttribute("deploy.error", error.message);
99
+ if (!error._sentryCaptured) {
100
+ Sentry.setTag("deploy.phase", "deploy");
101
+ Sentry.captureException(error);
102
+ }
103
+ throw error;
104
+ }
105
+ });
59
106
  } finally {
60
107
  await Sentry.flush(5000);
61
108
  }
@@ -67,6 +114,14 @@ export function setDeployAttribute(key, value) {
67
114
  if (span) span.setAttribute(key, value);
68
115
  }
69
116
 
117
+ export function captureWarning(message, context) {
118
+ if (!Sentry) return;
119
+ try {
120
+ Sentry.addBreadcrumb({ level: "warning", message, data: context });
121
+ Sentry.captureMessage(message, { level: "warning", extra: context });
122
+ } catch { /* telemetry must never break the deploy */ }
123
+ }
124
+
70
125
  export async function flush() {
71
126
  if (!Sentry) return;
72
127
  await Sentry.flush(5000);
package/test/test.js CHANGED
@@ -1,7 +1,8 @@
1
1
  import { test, describe } from "node:test";
2
2
  import assert from "node:assert/strict";
3
3
  import { createCID, encodeContenthash } from "../src/deploy.js";
4
- import { validateDomainLabel, fetchNonce, TX_TIMEOUT_MS } from "../src/dotns.js";
4
+ import { validateDomainLabel, stripTrailingDigits, fetchNonce, TX_TIMEOUT_MS } from "../src/dotns.js";
5
+ import { captureWarning, withSpan, withDeploySpan, resolveRepo } from "../src/telemetry.js";
5
6
 
6
7
  // ---------------------------------------------------------------------------
7
8
  // 1. createCID
@@ -86,7 +87,28 @@ describe("validateDomainLabel", () => {
86
87
  });
87
88
 
88
89
  // ---------------------------------------------------------------------------
89
- // 4. fetchNonce timeout
90
+ // 4. stripTrailingDigits
91
+ // ---------------------------------------------------------------------------
92
+ describe("stripTrailingDigits", () => {
93
+ test("strips trailing digits", () => {
94
+ assert.strictEqual(stripTrailingDigits("my-app00"), "my-app");
95
+ });
96
+
97
+ test("strips trailing hyphen after digits", () => {
98
+ assert.strictEqual(stripTrailingDigits("t3rminal-app-88-pr-80"), "t3rminal-app-88-pr");
99
+ });
100
+
101
+ test("handles no trailing digits", () => {
102
+ assert.strictEqual(stripTrailingDigits("my-app"), "my-app");
103
+ });
104
+
105
+ test("handles all digits suffix", () => {
106
+ assert.strictEqual(stripTrailingDigits("app-123"), "app");
107
+ });
108
+ });
109
+
110
+ // ---------------------------------------------------------------------------
111
+ // 5. fetchNonce timeout
90
112
  // ---------------------------------------------------------------------------
91
113
  describe("fetchNonce", () => {
92
114
  test(
@@ -112,3 +134,91 @@ describe("TX_TIMEOUT_MS", () => {
112
134
  assert.strictEqual(TX_TIMEOUT_MS, 90_000);
113
135
  });
114
136
  });
137
+
138
+ // ---------------------------------------------------------------------------
139
+ // 6. resolveRepo fallback chain
140
+ // ---------------------------------------------------------------------------
141
+ describe("resolveRepo", () => {
142
+ function withEnv(env, cwd, fn) {
143
+ const prev = process.env.GITHUB_REPOSITORY;
144
+ const prevCwd = process.cwd();
145
+ if (env !== undefined) process.env.GITHUB_REPOSITORY = env;
146
+ else delete process.env.GITHUB_REPOSITORY;
147
+ if (cwd) process.chdir(cwd);
148
+ try { return fn(); }
149
+ finally {
150
+ if (cwd) process.chdir(prevCwd);
151
+ if (prev !== undefined) process.env.GITHUB_REPOSITORY = prev;
152
+ else delete process.env.GITHUB_REPOSITORY;
153
+ }
154
+ }
155
+
156
+ test("prefers GITHUB_REPOSITORY env var", () => {
157
+ withEnv("myorg/myrepo", null, () => {
158
+ assert.strictEqual(resolveRepo("some-domain"), "myorg/myrepo");
159
+ });
160
+ });
161
+
162
+ test("falls back to git remote when GITHUB_REPOSITORY is unset", () => {
163
+ withEnv(undefined, null, () => {
164
+ const result = resolveRepo("fallback-domain");
165
+ assert.ok(result !== "unknown", `expected a resolved repo, got: ${result}`);
166
+ assert.ok(result !== "fallback-domain", `should not fall through to domain when git works`);
167
+ });
168
+ });
169
+
170
+ test("falls back to domain when git and package.json are unavailable", () => {
171
+ withEnv(undefined, "/tmp", () => {
172
+ assert.strictEqual(resolveRepo("instagram-dapp"), "instagram-dapp");
173
+ });
174
+ });
175
+
176
+ test("returns 'unknown' only when everything fails", () => {
177
+ withEnv(undefined, "/tmp", () => {
178
+ assert.strictEqual(resolveRepo(undefined), "unknown");
179
+ });
180
+ });
181
+ });
182
+
183
+ // ---------------------------------------------------------------------------
184
+ // 7. captureWarning
185
+ // ---------------------------------------------------------------------------
186
+ describe("captureWarning", () => {
187
+ test("does not throw when Sentry is disabled", () => {
188
+ assert.doesNotThrow(() => captureWarning("test warning", { key: "value" }));
189
+ });
190
+ });
191
+
192
+ // ---------------------------------------------------------------------------
193
+ // 8. withSpan error propagation
194
+ // ---------------------------------------------------------------------------
195
+ describe("withSpan", () => {
196
+ test("propagates errors from the callback", async () => {
197
+ await assert.rejects(
198
+ () => withSpan("test.op", "test span", {}, () => { throw new Error("span error"); }),
199
+ { message: "span error" }
200
+ );
201
+ });
202
+
203
+ test("returns the callback result on success", async () => {
204
+ const result = await withSpan("test.op", "test span", {}, () => "ok");
205
+ assert.strictEqual(result, "ok");
206
+ });
207
+ });
208
+
209
+ // ---------------------------------------------------------------------------
210
+ // 9. withDeploySpan error propagation
211
+ // ---------------------------------------------------------------------------
212
+ describe("withDeploySpan", () => {
213
+ test("propagates errors from the callback", async () => {
214
+ await assert.rejects(
215
+ () => withDeploySpan("test-domain", () => { throw new Error("deploy error"); }),
216
+ { message: "deploy error" }
217
+ );
218
+ });
219
+
220
+ test("returns the callback result on success", async () => {
221
+ const result = await withDeploySpan("test-domain", () => "deployed");
222
+ assert.strictEqual(result, "deployed");
223
+ });
224
+ });
@@ -65,6 +65,7 @@ jobs:
65
65
  env:
66
66
  MNEMONIC: ${{ secrets.DOTNS_MNEMONIC }}
67
67
  SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
68
+ GITHUB_REPOSITORY: ${{ github.repository }}
68
69
  NODE_OPTIONS: '--max-old-space-size=8192'
69
70
 
70
71
  - name: Comment on PR