@heystack/otel 0.2.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -13,7 +13,7 @@ Always read your key from the environment — never paste it into source:
13
13
  HEYSTACK_API_KEY=sk_live_…
14
14
  ```
15
15
 
16
- > **Requires `@heystack/otel` `>=0.2.0`.** See [Migration](#migration--versioning) below.
16
+ > **Requires `@heystack/otel` `>=0.3.0` (prefer `>=0.3.2`).** 0.3.2 fixes the no-op `suppressTracing` (feedback-loop guard) on the Workers/Next-on-OpenNext path and adds the Workers `nodejs_compat` requirement. See [Migration](#migration--versioning) below.
17
17
 
18
18
  ## Runtime matrix
19
19
 
@@ -36,6 +36,24 @@ initHeystack({ apiKey: process.env.HEYSTACK_API_KEY, service: "my-app" });
36
36
 
37
37
  This enables auto-instrumentations (HTTP, Express, etc.) so you get spans without manual wiring.
38
38
 
39
+ ### Slimming down auto-instrumentations (cost)
40
+
41
+ The default `getNodeAutoInstrumentations()` eagerly patches **~40 libraries** (HTTP, DNS, fs, net, gRPC, and every popular DB/HTTP client). That adds startup time and per-call overhead even for libraries your app never uses. To load only what you need, pass your own `instrumentations` array:
42
+
43
+ ```ts
44
+ import { initHeystack } from "@heystack/otel/node";
45
+ import { HttpInstrumentation } from "@opentelemetry/instrumentation-http";
46
+ import { ExpressInstrumentation } from "@opentelemetry/instrumentation-express";
47
+
48
+ initHeystack({
49
+ apiKey: process.env.HEYSTACK_API_KEY,
50
+ service: "my-app",
51
+ instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],
52
+ });
53
+ ```
54
+
55
+ `instrumentations` defaults to `[getNodeAutoInstrumentations()]`. Passing your own array replaces the default entirely (it's ignored when `autoInstrument: false`). `initHeystack` is also **idempotent** as of 0.3.2 — calling it again returns the already-started SDK (no duplicate instrumentations or signal handlers, which matters across Next dev-server reloads).
56
+
39
57
  ## Next.js (any deploy target, including Cloudflare/OpenNext)
40
58
 
41
59
  In `instrumentation.ts` at the project root:
@@ -70,6 +88,18 @@ export default instrument(
70
88
  );
71
89
  ```
72
90
 
91
+ As of **0.3.0** `instrument()` registers the **global** tracer provider and creates the per-request SERVER span via the global tracer, so nested spans created through the global `trace.getTracer()` API (framework/library/manual) also export — you get a trace tree, not a lone SERVER span.
92
+
93
+ > **Requires `nodejs_compat` on workerd.** As of **0.3.2** the SDK registers an OpenTelemetry **ContextManager** at init (see below), backed by `AsyncLocalStorage` from `node:async_hooks`. On Cloudflare Workers that means your `wrangler.toml` must enable the Node.js compatibility flag:
94
+ > ```toml
95
+ > compatibility_flags = ["nodejs_compat"]
96
+ > ```
97
+ > If `node:async_hooks` is unavailable, the SDK transparently falls back to a synchronous stack-based ContextManager (no extra dependency) — suppression still works, but cross-`await` parent linking and per-request context isolation degrade to best-effort.
98
+
99
+ ### Why a ContextManager (0.3.2)
100
+
101
+ `context.with(...)` in OpenTelemetry is a **no-op unless a ContextManager is registered** with the global API. Before 0.3.2 the Workers path registered only a tracer provider, so `suppressTracing()` — the primary defence against the self-trace feedback loop — silently did nothing in production (the exporter's own `POST /v1/traces` could be re-traced by host fetch auto-instrumentation, looping). As of **0.3.2** the SDK registers a ContextManager exactly once at init. With `AsyncLocalStorageContextManager` (the default, on Node and on workerd under `nodejs_compat`) you also get **cross-`await` parent→child span linking** and **per-request context isolation** — concurrent requests no longer share or clobber the active span.
102
+
73
103
  `instrument()` must be the **outermost** wrapper if other middleware also wraps the handler, so the request span covers everything inside:
74
104
 
75
105
  ```ts
@@ -78,17 +108,64 @@ export default instrument(withOtherMiddleware(worker), { service: "my-worker" })
78
108
 
79
109
  Set the key as a secret: `wrangler secret put HEYSTACK_API_KEY`.
80
110
 
111
+ As of **0.3.1** `instrument()` **forwards every other handler your Worker exports** — `queue`, `scheduled`, `tail`, etc. — untouched, so wrapping never drops a handler Cloudflare requires for deploy (it previously returned only `{ fetch }`, which broke Queue/Cron Workers). On top of forwarding, `queue` and `scheduled` are themselves traced when present: each gets a root span via the global tracer (`queue <queueName>` as a CONSUMER span with batch attributes; `scheduled <cron>` as an INTERNAL span with the cron attribute), flushed via `ctx.waitUntil` just like `fetch`.
112
+
113
+ ### Durable Objects are NOT covered by `instrument()`
114
+
115
+ `instrument()` wraps the keys of the **default-export handler object** (`fetch`/`queue`/`scheduled`/… ). **Durable Objects are separate named class exports**, so spreading the handler object does not touch them — a DO's `fetch`/`alarm` methods run **untraced** even when your Worker's default export is wrapped.
116
+
117
+ To trace a Durable Object, instrument it manually with the global tracer (which `instrument()` / `initHeystackWorkers()` already registered) and flush per invocation:
118
+
119
+ ```ts
120
+ import { trace, SpanKind, context } from "@opentelemetry/api";
121
+ import { flushHeystack } from "@heystack/otel/workers";
122
+
123
+ export class Counter {
124
+ async fetch(req: Request): Promise<Response> {
125
+ const tracer = trace.getTracer("heystack");
126
+ const span = tracer.startSpan(`DO ${new URL(req.url).pathname}`, {
127
+ kind: SpanKind.SERVER,
128
+ });
129
+ try {
130
+ return await context.with(trace.setSpan(context.active(), span), async () => {
131
+ // ...your DO logic...
132
+ return new Response("ok");
133
+ });
134
+ } finally {
135
+ span.end();
136
+ this.state.waitUntil(flushHeystack()); // ensure the export POST completes
137
+ }
138
+ }
139
+ }
140
+ ```
141
+
142
+ The default export still needs to be wrapped with `instrument()` (or `initHeystackWorkers()` called) so the global provider + ContextManager are registered before the DO runs.
143
+
81
144
  ## Flushing
82
145
 
83
146
  On Workers/edge the export is a `fetch()` POST, and the isolate can be torn down the instant your handler returns. **You must let that POST complete or the trace is silently dropped** — this is the #1 cause of flaky Workers tracing. `flushHeystack()` and `instrument()`'s built-in flush both await the in-flight fetch (not just the OTel span processor, which does *not* wait for it).
84
147
 
85
148
  - **Standalone Workers (`instrument()`)** — flushes automatically. After the response it `ctx.waitUntil`s a promise that drains both the span processor and the exporter's in-flight fetch, so the POST finishes before the isolate is killed. No action needed.
86
- - **Next on workerd (`registerHeystack`)** — there's no per-request `ExecutionContext`, so spans are **not** auto-flushed. For guaranteed delivery, `import { flushHeystack } from "@heystack/otel/workers"` and call it from a response hook (or `ctx.waitUntil(flushHeystack())` if you have a ctx) after handling a request. `flushHeystack()` awaits the export fetch.
149
+ - **Next on Cloudflare/OpenNext (`registerHeystack`)** — as of **0.3.0** this flushes automatically when `@opennextjs/cloudflare` is present: the export runs inside the Cloudflare request context, so the exporter borrows that request's `ctx.waitUntil` (via OpenNext's `getCloudflareContext`) to keep the isolate alive until the POST completes. **No manual hook needed.** For other workerd setups *without* `@opennextjs/cloudflare`, `import { flushHeystack } from "@heystack/otel/workers"` and call it from a response hook (or `ctx.waitUntil(flushHeystack())` if you have a ctx) or pass an explicit `waitUntil` to `initHeystackWorkers` (highest priority). `flushHeystack()` awaits the export fetch.
87
150
  - **Node (`initHeystack`)** — flushes on `SIGTERM`/`SIGINT` automatically.
88
151
 
152
+ ## No feedback loop with host fetch auto-instrumentation
153
+
154
+ As of **0.3.1** the exporter **suppresses tracing for its own ingest POST**, and **0.3.2 makes that suppression actually take effect in production**. On Next/OpenNext the host auto-instruments outbound `fetch`, so without this the exporter's `POST /v1/traces` became a CLIENT span → exported → re-captured → a sustained loop (wall-to-wall identical `fetch POST .../v1/traces` spans). The POST runs inside an OpenTelemetry tracing-suppressed context (`suppressTracing`) — but `context.with()` is a no-op unless a ContextManager is registered, which 0.3.1 did not do, so suppression silently did nothing. **0.3.2 registers a ContextManager** (see [Why a ContextManager](#why-a-contextmanager-032)), so the POST is genuinely suppressed.
155
+
156
+ As belt-and-suspenders the exporter also drops any span whose HTTP target points at the configured ingest origin. As of **0.3.2** that match is **hostname-accurate**: full-URL attributes (`url.full`, `http.url`) are parsed and compared on `.hostname` (case-insensitive, port-stripped) so a sibling domain like `myingest.heystack.dev` is no longer a false positive and an explicit port like `ingest.heystack.dev:443` is correctly matched; host-only attributes (`server.address`, `net.peer.name`, `net.peer.hostname`, `http.host`, `peer.address`) are port-stripped and compared by hostname.
157
+
89
158
  ## Migration / versioning
90
159
 
91
- - **Pin `@heystack/otel` `>=0.2.0`** — the workerd-aware `/next` path and `initHeystackWorkers` / `flushHeystack` exports were added in 0.2.0.
160
+ - **`0.3.2`** — runtime-correctness fixes:
161
+ - **ContextManager registered (CRITICAL).** 0.3.1's `suppressTracing` was a no-op because no ContextManager was registered, so the exporter's ingest POST could still be re-traced into a feedback loop. 0.3.2 registers an `AsyncLocalStorageContextManager` (Node + workerd under `nodejs_compat`; sync stack-manager fallback otherwise) so suppression works — and as a bonus you get cross-`await` span parenting + per-request context isolation. **Workers now require `nodejs_compat`.** New dependency: `@opentelemetry/context-async-hooks`.
162
+ - **Hostname-accurate self-span filter** (no sibling-domain false positives; `host:port` now matched; more host-only attrs covered).
163
+ - **OpenNext accessor race fixed** — the `getCloudflareContext` accessor loads eagerly and early exports that fire before it resolves still get handed to `ctx.waitUntil`; accessor/`waitUntil` failures are now logged (once) instead of silently swallowed.
164
+ - **Node SDK hardening** — `initHeystack` is idempotent (cached singleton) and registers SIGTERM/SIGINT handlers at most once (no leak across Next dev reloads); new optional `instrumentations` field to load a slimmer instrumentation set.
165
+ - **Drain timeout** — `instrument()`'s `ctx.waitUntil` flush is raced against an ~8s timeout so a hung ingest can't pin the isolate to its CPU limit.
166
+ - **Bundler hints** — `/next`'s runtime-selected dynamic imports carry `@vite-ignore` + `webpackIgnore` so a Node build doesn't bundle the workers path (and vice-versa).
167
+ - **`0.3.1`** — `instrument()` now **forwards `queue`/`scheduled`/`tail` (and any other handler)** instead of returning only `{ fetch }`, and traces `queue`/`scheduled`; the exporter **suppresses self-tracing** on its ingest POST (note: only effective from 0.3.2, which registers the ContextManager). Both are production-reproduced bug fixes; upgrade is recommended for any Queue/Cron Worker or Next-on-OpenNext app.
168
+ - **Pin `@heystack/otel` `>=0.3.0`** — 0.3.0 makes Next-on-OpenNext auto-flush via the Cloudflare request context, hardens workerd detection (uses the `WebSocketPair` global so it survives `nodejs_compat`), and has `instrument()` set the global provider so nested spans export. The workerd-aware `/next` path and `initHeystackWorkers` / `flushHeystack` exports were added in 0.2.0.
92
169
  - The pre-0.1.0 top-level default `initHeystack({ apiKey })` is **gone**. Use the subpath entries: `@heystack/otel/node`, `@heystack/otel/next`, `@heystack/otel/workers`. The root `@heystack/otel` entry now exposes only pure helpers (`buildExporterConfig`, types).
93
170
 
94
171
  ## Verify it's working
package/dist/next.d.ts CHANGED
@@ -10,9 +10,12 @@ import type { HeystackOptions } from "./core.js";
10
10
  * and use the fetch-based exporter (@heystack/otel/workers) instead. On the Edge
11
11
  * runtime neither SDK can run, so it's a no-op.
12
12
  *
13
- * FLUSH (workerd): there is no per-request `ExecutionContext` on this path, so
14
- * spans are NOT auto-flushed and the export fetch may be cut off when the
15
- * isolate is torn down. For guaranteed delivery, import and call
13
+ * FLUSH (workerd): there is no per-request `ExecutionContext` handed to this
14
+ * function, but on Next-on-OpenNext the export runs inside the Cloudflare
15
+ * request context, so the exporter automatically borrows that request's
16
+ * `ctx.waitUntil` (via `@opennextjs/cloudflare`'s `getCloudflareContext`) to
17
+ * keep the isolate alive until the export POST completes — no app hook needed.
18
+ * For other workerd setups without `@opennextjs/cloudflare`, import and call
16
19
  * `flushHeystack()` from `@heystack/otel/workers` in a response hook (or via
17
20
  * `ctx.waitUntil(flushHeystack())` if you have a ctx) — it awaits the export.
18
21
  */
package/dist/next.js CHANGED
@@ -9,9 +9,12 @@
9
9
  * and use the fetch-based exporter (@heystack/otel/workers) instead. On the Edge
10
10
  * runtime neither SDK can run, so it's a no-op.
11
11
  *
12
- * FLUSH (workerd): there is no per-request `ExecutionContext` on this path, so
13
- * spans are NOT auto-flushed and the export fetch may be cut off when the
14
- * isolate is torn down. For guaranteed delivery, import and call
12
+ * FLUSH (workerd): there is no per-request `ExecutionContext` handed to this
13
+ * function, but on Next-on-OpenNext the export runs inside the Cloudflare
14
+ * request context, so the exporter automatically borrows that request's
15
+ * `ctx.waitUntil` (via `@opennextjs/cloudflare`'s `getCloudflareContext`) to
16
+ * keep the isolate alive until the export POST completes — no app hook needed.
17
+ * For other workerd setups without `@opennextjs/cloudflare`, import and call
15
18
  * `flushHeystack()` from `@heystack/otel/workers` in a response hook (or via
16
19
  * `ctx.waitUntil(flushHeystack())` if you have a ctx) — it awaits the export.
17
20
  */
@@ -26,15 +29,28 @@ export async function registerHeystack(o) {
26
29
  }
27
30
  // Detect Cloudflare workerd (OpenNext / Workers). NEXT_RUNTIME is "nodejs" there,
28
31
  // but the Node SDK's node:http exporter can't send — use the fetch exporter instead.
32
+ //
33
+ // The old heuristic combined `caches.default` with `!process.versions.node`,
34
+ // which is WRONG under `nodejs_compat`: that compat layer polyfills
35
+ // `process.versions.node`, so the `!` is false and we'd pick the Node path
36
+ // (which silently exports nothing). Use Cloudflare-only signals that survive
37
+ // nodejs_compat instead: the `Cloudflare-Workers` UA and the `WebSocketPair`
38
+ // global (a Workers-only global present even under nodejs_compat).
29
39
  const g = globalThis;
30
40
  const onWorkerd = g.navigator?.userAgent === "Cloudflare-Workers" ||
31
- (typeof g.caches?.default !== "undefined" && !g.process?.versions?.node);
41
+ typeof g.WebSocketPair !== "undefined";
32
42
  if (onWorkerd) {
33
- const { initHeystackWorkers } = await import("./workers.js");
43
+ // Bundler-ignore hints: a Node build must not eagerly bundle the workers
44
+ // path (and a workers build must not bundle the node path) — the import is
45
+ // runtime-selected per environment. `@vite-ignore` covers Vite/Rollup;
46
+ // `webpackIgnore` covers Next/Turbopack/webpack.
47
+ const { initHeystackWorkers } = await import(
48
+ /* @vite-ignore */ /* webpackIgnore: true */ "./workers.js");
34
49
  initHeystackWorkers({ apiKey, service: o.service, endpoint: o.endpoint });
35
50
  }
36
51
  else {
37
- const { initHeystack } = await import("./node.js");
52
+ const { initHeystack } = await import(
53
+ /* @vite-ignore */ /* webpackIgnore: true */ "./node.js");
38
54
  initHeystack({
39
55
  apiKey,
40
56
  service: o.service,
package/dist/node.d.ts CHANGED
@@ -1,12 +1,33 @@
1
1
  import { NodeSDK } from "@opentelemetry/sdk-node";
2
+ import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
2
3
  import { type HeystackOptions } from "./core.js";
4
+ /**
5
+ * The element type the NodeSDK `instrumentations` config accepts (an OTel
6
+ * `Instrumentation` or array thereof). Derived from `getNodeAutoInstrumentations`
7
+ * so we don't take a direct dependency on `@opentelemetry/instrumentation` just
8
+ * for a type.
9
+ */
10
+ type InstrumentationConfigItem = ReturnType<typeof getNodeAutoInstrumentations>[number];
3
11
  export interface NodeOptions extends HeystackOptions {
4
12
  /** Enable OTel diagnostic logging to console to confirm export. */
5
13
  debug?: boolean;
6
14
  /** Set false to skip auto-instrumentations (you'll then only get framework/manual spans). Default true. */
7
15
  autoInstrument?: boolean;
16
+ /**
17
+ * Provide your own instrumentation array instead of the default
18
+ * `getNodeAutoInstrumentations()`. Use this to load a slimmer set (the default
19
+ * eagerly patches ~40 libraries — DNS, fs, net, gRPC, every popular DB/HTTP
20
+ * client — which adds startup cost and overhead even for libs you don't use).
21
+ * Ignored when `autoInstrument === false`. Example:
22
+ * import { HttpInstrumentation } from "@opentelemetry/instrumentation-http";
23
+ * initHeystack({ ..., instrumentations: [new HttpInstrumentation()] });
24
+ */
25
+ instrumentations?: InstrumentationConfigItem[];
8
26
  }
9
27
  /** Initialise Heystack tracing on a Node runtime. Call once, as early as possible. Returns the started SDK. */
10
28
  export declare function initHeystack(o: NodeOptions): NodeSDK;
11
- /** Flush + shutdown the SDK on SIGTERM/SIGINT so short-lived processes don't lose the last batch. */
29
+ /** Flush + shutdown the SDK on SIGTERM/SIGINT so short-lived processes don't lose the last batch. Registers handlers at most once. */
12
30
  export declare function shutdownOnSignals(sdk: NodeSDK): void;
31
+ /** Reset the cached SDK + signal-handler guard. Internal/testing helper. */
32
+ export declare function __resetNodeSdk(): void;
33
+ export {};
package/dist/node.js CHANGED
@@ -3,22 +3,47 @@ import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
3
3
  import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
4
4
  import { diag, DiagConsoleLogger, DiagLogLevel } from "@opentelemetry/api";
5
5
  import { buildExporterConfig } from "./core.js";
6
+ /**
7
+ * Process-level guard so SIGTERM/SIGINT handlers are registered AT MOST ONCE,
8
+ * even if `initHeystack` is called repeatedly (e.g. Next dev server reloads).
9
+ * Without this, every re-init leaked another pair of signal listeners and
10
+ * eventually tripped Node's MaxListenersExceededWarning.
11
+ */
12
+ let _signalHandlersRegistered = false;
13
+ /**
14
+ * Cached singleton SDK. `initHeystack` is meant to be called once; on repeat
15
+ * calls we return the already-started SDK instead of constructing/starting a
16
+ * second NodeSDK (which would double-register instrumentations and signal
17
+ * handlers).
18
+ */
19
+ let _sdk = null;
6
20
  /** Initialise Heystack tracing on a Node runtime. Call once, as early as possible. Returns the started SDK. */
7
21
  export function initHeystack(o) {
22
+ // Idempotent: a second call returns the cached SDK rather than starting a new
23
+ // one (which would duplicate instrumentations + signal handlers).
24
+ if (_sdk)
25
+ return _sdk;
8
26
  if (o.debug)
9
27
  diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);
10
28
  const cfg = buildExporterConfig(o);
29
+ const instrumentations = o.autoInstrument === false
30
+ ? []
31
+ : (o.instrumentations ?? [getNodeAutoInstrumentations()]);
11
32
  const sdk = new NodeSDK({
12
33
  serviceName: o.service,
13
34
  traceExporter: new OTLPTraceExporter({ url: cfg.url, headers: cfg.headers }),
14
- instrumentations: o.autoInstrument === false ? [] : [getNodeAutoInstrumentations()],
35
+ instrumentations,
15
36
  });
16
37
  sdk.start();
17
38
  shutdownOnSignals(sdk);
39
+ _sdk = sdk;
18
40
  return sdk;
19
41
  }
20
- /** Flush + shutdown the SDK on SIGTERM/SIGINT so short-lived processes don't lose the last batch. */
42
+ /** Flush + shutdown the SDK on SIGTERM/SIGINT so short-lived processes don't lose the last batch. Registers handlers at most once. */
21
43
  export function shutdownOnSignals(sdk) {
44
+ if (_signalHandlersRegistered)
45
+ return;
46
+ _signalHandlersRegistered = true;
22
47
  const stop = () => {
23
48
  sdk
24
49
  .shutdown()
@@ -28,3 +53,8 @@ export function shutdownOnSignals(sdk) {
28
53
  process.once("SIGTERM", stop);
29
54
  process.once("SIGINT", stop);
30
55
  }
56
+ /** Reset the cached SDK + signal-handler guard. Internal/testing helper. */
57
+ export function __resetNodeSdk() {
58
+ _sdk = null;
59
+ _signalHandlersRegistered = false;
60
+ }
package/dist/workers.d.ts CHANGED
@@ -1,4 +1,5 @@
1
1
  import { type Span } from "@opentelemetry/api";
2
+ import { type Context, type ContextManager } from "@opentelemetry/api";
2
3
  import { BasicTracerProvider, type ReadableSpan, type SpanExporter } from "@opentelemetry/sdk-trace-base";
3
4
  import { type HeystackOptions } from "./core.js";
4
5
  declare const ExportResultCode: {
@@ -53,12 +54,21 @@ interface OtlpTracesPayload {
53
54
  * hence the same resource, so we emit a single resourceSpans entry.
54
55
  */
55
56
  export declare function serializeSpans(spans: ReadableSpan[]): OtlpTracesPayload;
57
+ /**
58
+ * Test-only helper: run the self-span attribute check directly against a plain
59
+ * attribute bag + ingest hostname, without constructing a ReadableSpan. The
60
+ * `ingestHost` should be a bare hostname (lower-case, no port), matching what
61
+ * the exporter derives via `safeHostname(cfg.url)`.
62
+ */
63
+ export declare function isSelfSpanForTest(attrs: Record<string, unknown>, ingestHost: string): boolean;
56
64
  /**
57
65
  * A WinterCG-compatible OTLP/JSON span exporter. POSTs ended spans to the
58
66
  * Heystack ingest using the platform `fetch` — no Node built-ins.
59
67
  */
60
68
  export declare class HeystackSpanExporter implements SpanExporter {
61
69
  private readonly url;
70
+ /** Hostname (no port) of the ingest endpoint, used to drop self-trace spans. */
71
+ private readonly ingestHost;
62
72
  private readonly headers;
63
73
  private shutdownState;
64
74
  /**
@@ -69,8 +79,27 @@ export declare class HeystackSpanExporter implements SpanExporter {
69
79
  * dropped on fast-responding Workers/edge handlers.
70
80
  */
71
81
  private readonly pending;
82
+ /**
83
+ * Optional `waitUntil` hook. When set, each in-flight export `fetch` is also
84
+ * handed to it so the runtime keeps the isolate alive until the POST
85
+ * completes — without any per-request hook in app code. This is the reliable
86
+ * delivery path on Next-on-OpenNext, where there is no `ExecutionContext`
87
+ * passed to `registerHeystack` but the export DOES run inside a Cloudflare
88
+ * request context whose `ctx.waitUntil` we can borrow. Falls back silently to
89
+ * the `pending` set + manual `flushHeystack()` when absent/unavailable.
90
+ */
91
+ waitUntil?: (p: Promise<unknown>) => void;
72
92
  constructor(options: HeystackOptions);
73
93
  export(spans: ReadableSpan[], resultCallback: (result: ExportResult) => void): void;
94
+ /**
95
+ * Hand `p` to the OpenNext Cloudflare request context's `ctx.waitUntil`.
96
+ * Resolves the accessor first (awaiting its in-flight load if needed) so an
97
+ * export that fired before the dynamic import settled is still covered.
98
+ */
99
+ private handOffToCloudflareContext;
100
+ private warnedWaitUntilFailure;
101
+ /** Log a waitUntil/accessor failure exactly once; never throw. */
102
+ private warnWaitUntilFailed;
74
103
  shutdown(): Promise<void>;
75
104
  /**
76
105
  * Resolve only once every in-flight export fetch has settled. This is the
@@ -85,6 +114,14 @@ export interface WorkersConfig {
85
114
  /** Defaults to env.HEYSTACK_API_KEY at request time if omitted. */
86
115
  apiKey?: string;
87
116
  endpoint?: string;
117
+ /**
118
+ * Optional override to keep the isolate alive until each export `fetch`
119
+ * completes. When provided this takes priority over the auto-detected
120
+ * OpenNext Cloudflare request context. Typically `ctx.waitUntil` from your
121
+ * Worker's `ExecutionContext`. If omitted on Next-on-OpenNext, the exporter
122
+ * automatically borrows the request context's `ctx.waitUntil`.
123
+ */
124
+ waitUntil?: (p: Promise<unknown>) => void;
88
125
  }
89
126
  /**
90
127
  * A `BasicTracerProvider` with the underlying `HeystackSpanExporter` attached so
@@ -105,15 +142,36 @@ export type HeystackTracerProvider = BasicTracerProvider & {
105
142
  * `flushHeystack()`), not just `provider.forceFlush()`.
106
143
  */
107
144
  export declare function createTracerProvider(config: HeystackOptions): HeystackTracerProvider;
145
+ /**
146
+ * A minimal SYNCHRONOUS, stack-based ContextManager — the registered manager for
147
+ * the /workers entry (no `node:async_hooks`, so it works on any WinterCG runtime).
148
+ * It makes `context.with()` propagate synchronously, which is enough for the
149
+ * exporter's `suppressTracing` to take effect and for the belt-and-suspenders
150
+ * self-span filter — but it does NOT carry context across `await` boundaries (so
151
+ * cross-`await` parent linking and per-request isolation are best-effort).
152
+ */
153
+ export declare class SyncStackContextManager implements ContextManager {
154
+ private _stack;
155
+ active(): Context;
156
+ with<A extends unknown[], F extends (...args: A) => ReturnType<F>>(ctx: Context, fn: F, thisArg?: ThisParameterType<F>, ...args: A): ReturnType<F>;
157
+ bind<T>(_ctx: Context, target: T): T;
158
+ enable(): this;
159
+ disable(): this;
160
+ }
161
+ /** Reset the context-manager registration guard. Internal/testing helper. */
162
+ export declare function __resetContextManager(): void;
108
163
  /**
109
164
  * Register Heystack as the global tracer provider on a Workers/edge runtime
110
165
  * (workerd). Spans from the host framework (e.g. Next.js) export over fetch.
111
166
  * Use this instead of @heystack/otel/node when running on Cloudflare/edge.
112
167
  *
113
- * FLUSH: there is no per-request `ExecutionContext` on this path, so spans are
114
- * NOT auto-flushed. To guarantee delivery on workerd, call `flushHeystack()`
115
- * (which awaits the export fetch) from a response hook or hand it to
116
- * `ctx.waitUntil(flushHeystack())` if you do have a ctx. Returns the provider.
168
+ * FLUSH: on Next-on-OpenNext (where `@opennextjs/cloudflare` is present) the
169
+ * exporter automatically borrows the Cloudflare request context's
170
+ * `ctx.waitUntil` so the export POST completes before the isolate is torn down
171
+ * no app hook needed. For other workerd setups without it, call
172
+ * `flushHeystack()` (which awaits the export fetch) from a response hook, hand
173
+ * it to `ctx.waitUntil(flushHeystack())`, or pass an explicit `waitUntil` in
174
+ * the config (highest priority). Returns the provider.
117
175
  */
118
176
  export declare function initHeystackWorkers(config: WorkersConfig & {
119
177
  apiKey: string;
@@ -131,8 +189,18 @@ export declare function flushHeystack(): Promise<void>;
131
189
  export declare function __resetProvider(): void;
132
190
  /** Reset the once-only no-key warning. Internal/testing helper. */
133
191
  export declare function __resetWarnings(): void;
134
- interface FetchHandler<E> {
135
- fetch: (req: Request, env: E, ctx: ExecutionContext) => Promise<Response> | Response;
192
+ /**
193
+ * The shape of a Worker's default export. `fetch` is the common entrypoint, but
194
+ * a Worker may also export `queue` / `scheduled` / `tail` (and arbitrary other
195
+ * siblings). `instrument()` forwards ALL of these untouched except `fetch`
196
+ * (always traced) and `queue` / `scheduled` (traced when present) — so wrapping
197
+ * a Worker never drops a handler Cloudflare requires for deploy.
198
+ */
199
+ interface WorkerHandler<E> {
200
+ fetch?: (req: Request, env: E, ctx: ExecutionContext) => Promise<Response> | Response;
201
+ queue?: (batch: MessageBatch, env: E, ctx: ExecutionContext) => Promise<void> | void;
202
+ scheduled?: (controller: ScheduledController, env: E, ctx: ExecutionContext) => Promise<void> | void;
203
+ [key: string]: unknown;
136
204
  }
137
205
  /**
138
206
  * Wrap a Worker's default export so every request is auto-traced with a SERVER
@@ -150,10 +218,40 @@ interface FetchHandler<E> {
150
218
  * { service: "my-worker" },
151
219
  * );
152
220
  *
221
+ * TRACE TREE: `instrument()` sets up the singleton GLOBAL tracer provider and
222
+ * creates the root span via the global tracer (`trace.getTracer("heystack")`).
223
+ * This means nested spans created through the global `trace.getTracer()` API
224
+ * (framework / library / your own manual spans) also flow to the exporter — you
225
+ * get a trace tree, not a lone root span.
226
+ *
227
+ * ALL HANDLERS FORWARDED: the returned object spreads `handler`, so sibling
228
+ * handlers a Worker exports (`tail`, `email`, etc.) are preserved — Cloudflare
229
+ * won't reject the deploy for a missing handler. `fetch` is always replaced with
230
+ * a traced wrapper; `queue` and `scheduled` are wrapped (and traced) when
231
+ * present; everything else is forwarded untouched.
232
+ *
153
233
  * If no API key is available (neither `config.apiKey` nor
154
234
  * `env.HEYSTACK_API_KEY`), the handler runs untraced.
155
235
  */
156
- export declare function instrument<E = unknown>(handler: FetchHandler<E>, config: WorkersConfig): {
236
+ /**
237
+ * The instrumented result: the original handler with the traced entrypoints
238
+ * normalised to their full Worker signatures (so callers/tests can invoke them
239
+ * with `(arg, env, ctx)`), and every other sibling handler forwarded as-is. A
240
+ * traced key is present (and non-optional) exactly when it exists on the input
241
+ * handler, so a `{ fetch }` Worker yields a non-optional `fetch`.
242
+ */
243
+ type Instrumented<E, H> = Omit<H, "fetch" | "queue" | "scheduled"> & (H extends {
244
+ fetch: unknown;
245
+ } ? {
157
246
  fetch: (req: Request, env: E, ctx: ExecutionContext) => Promise<Response>;
158
- };
247
+ } : unknown) & (H extends {
248
+ queue: unknown;
249
+ } ? {
250
+ queue: (batch: MessageBatch, env: E, ctx: ExecutionContext) => Promise<void>;
251
+ } : unknown) & (H extends {
252
+ scheduled: unknown;
253
+ } ? {
254
+ scheduled: (controller: ScheduledController, env: E, ctx: ExecutionContext) => Promise<void>;
255
+ } : unknown);
256
+ export declare function instrument<E = unknown, H extends WorkerHandler<E> = WorkerHandler<E>>(handler: H, config: WorkersConfig): Instrumented<E, H>;
159
257
  export type { Span };
package/dist/workers.js CHANGED
@@ -8,6 +8,8 @@
8
8
  // ships its own OTLP/JSON-over-fetch span exporter so it runs on Workers/Edge
9
9
  // where the Node SDK cannot.
10
10
  import { context, trace, SpanKind, SpanStatusCode, } from "@opentelemetry/api";
11
+ import { suppressTracing } from "@opentelemetry/core";
12
+ import { ROOT_CONTEXT } from "@opentelemetry/api";
11
13
  import { Resource } from "@opentelemetry/resources";
12
14
  import { BasicTracerProvider, SimpleSpanProcessor, } from "@opentelemetry/sdk-trace-base";
13
15
  import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
@@ -17,6 +19,37 @@ import { buildExporterConfig } from "./core.js";
17
19
  // transitive dep of sdk-trace-base and isn't reliably resolvable, and keeping it
18
20
  // out guarantees no extra (potentially node-platform) code in the bundle.
19
21
  const ExportResultCode = { SUCCESS: 0, FAILED: 1 };
22
+ let _getCloudflareContext;
23
+ let _cfAccessorLoadPromise;
24
+ /**
25
+ * Best-effort, once-only resolve of OpenNext's `getCloudflareContext`. The
26
+ * import is async, so the FIRST export(s) may run before it settles — see
27
+ * `export()`, which awaits this promise so those early exports still get handed
28
+ * to `ctx.waitUntil` once the accessor resolves (rather than relying solely on
29
+ * an explicit `flushHeystack()` that the OpenNext path doesn't call). Started
30
+ * eagerly at module init so the window is as small as possible.
31
+ */
32
+ function loadCloudflareContextAccessor() {
33
+ if (_cfAccessorLoadPromise)
34
+ return _cfAccessorLoadPromise;
35
+ _cfAccessorLoadPromise = (async () => {
36
+ try {
37
+ const spec = "@opennextjs/cloudflare";
38
+ const m = (await import(/* @vite-ignore */ /* webpackIgnore: true */ spec));
39
+ _getCloudflareContext = m.getCloudflareContext;
40
+ }
41
+ catch {
42
+ // Not on OpenNext (or the package isn't installed) — fall back silently to
43
+ // the `pending` set + manual flush. This is the common, expected case off
44
+ // OpenNext, so it is intentionally NOT logged.
45
+ }
46
+ })();
47
+ return _cfAccessorLoadPromise;
48
+ }
49
+ // Kick off the accessor resolution eagerly at module load so it's ready (or
50
+ // nearly so) by the time the first export runs, minimising the race where an
51
+ // early export can't borrow `ctx.waitUntil`.
52
+ void loadCloudflareContextAccessor();
20
53
  /** Convert an OTel HrTime `[seconds, nanos]` tuple to a nanosecond string. */
21
54
  function hrTimeToUnixNano(time) {
22
55
  // BigInt math keeps full nanosecond precision without float rounding.
@@ -105,6 +138,98 @@ export function serializeSpans(spans) {
105
138
  };
106
139
  }
107
140
  // ---------------------------------------------------------------------------
141
+ // Self-span filtering (feedback-loop guard)
142
+ //
143
+ // On Next/OpenNext the host auto-instruments outbound `fetch`, so the exporter's
144
+ // own POST to `/v1/traces` becomes a CLIENT span → exported → re-captured → a
145
+ // sustained loop. The primary defence is exporting under a suppressed context
146
+ // (see `export()`), but as belt-and-suspenders we also drop any span that
147
+ // targets the ingest origin, so an upstream instrumentation that ignores
148
+ // suppression still can't feed the loop.
149
+ // ---------------------------------------------------------------------------
150
+ /**
151
+ * Parse the hostname (no port) out of a URL, lower-cased; empty string if it
152
+ * can't be parsed. We compare on hostname rather than `host` so that an ingest
153
+ * URL like `ingest.heystack.dev` matches a captured span attribute of
154
+ * `ingest.heystack.dev:443` (and vice-versa).
155
+ */
156
+ function safeHostname(url) {
157
+ try {
158
+ return new URL(url).hostname.toLowerCase();
159
+ }
160
+ catch {
161
+ return "";
162
+ }
163
+ }
164
+ /**
165
+ * Strip a trailing `:port` from a bare host attribute and lower-case it, so a
166
+ * host-only attr like `ingest.heystack.dev:443` compares equal to the ingest
167
+ * hostname. IPv6 literals (`[::1]:443`) keep their bracketed form. Returns ""
168
+ * for anything that isn't a non-empty string.
169
+ */
170
+ function hostnameOf(hostAttr) {
171
+ if (typeof hostAttr !== "string" || hostAttr === "")
172
+ return "";
173
+ const v = hostAttr.trim();
174
+ // Bracketed IPv6, optionally with a port: `[::1]` or `[::1]:443`.
175
+ if (v.startsWith("[")) {
176
+ const close = v.indexOf("]");
177
+ if (close !== -1)
178
+ return v.slice(0, close + 1).toLowerCase();
179
+ return v.toLowerCase();
180
+ }
181
+ // Strip a single trailing :port (host:port). A bare hostname has no colon.
182
+ const colon = v.lastIndexOf(":");
183
+ if (colon !== -1 && v.indexOf(":") === colon) {
184
+ return v.slice(0, colon).toLowerCase();
185
+ }
186
+ return v.toLowerCase();
187
+ }
188
+ /** Attributes that carry a full HTTP URL on a CLIENT/SERVER span. */
189
+ const HTTP_URL_ATTRS = ["url.full", "http.url"];
190
+ /** Host-only attributes (host[:port], no scheme/path). */
191
+ const HTTP_HOST_ATTRS = [
192
+ "server.address",
193
+ "net.peer.name",
194
+ "net.peer.hostname",
195
+ "http.host",
196
+ "peer.address",
197
+ ];
198
+ /**
199
+ * True if `span` looks like a request to the configured ingest origin — i.e. it
200
+ * is (or could be) the exporter's own self-trace. For full-URL attributes we
201
+ * parse the URL and compare its `.hostname` (case-insensitive, port stripped) so
202
+ * a sibling domain like `myingest.heystack.dev` is NOT a false positive and an
203
+ * explicit port like `ingest.heystack.dev:443` IS matched. For host-only attrs
204
+ * we strip any `:port` and compare hostname equality.
205
+ */
206
+ function isSelfSpanAttrs(attrs, ingestHost) {
207
+ if (!ingestHost)
208
+ return false;
209
+ for (const key of HTTP_URL_ATTRS) {
210
+ const v = attrs[key];
211
+ if (typeof v === "string" && safeHostname(v) === ingestHost)
212
+ return true;
213
+ }
214
+ for (const key of HTTP_HOST_ATTRS) {
215
+ if (hostnameOf(attrs[key]) === ingestHost)
216
+ return true;
217
+ }
218
+ return false;
219
+ }
220
+ function isSelfSpan(span, ingestHost) {
221
+ return isSelfSpanAttrs(span.attributes, ingestHost);
222
+ }
223
+ /**
224
+ * Test-only helper: run the self-span attribute check directly against a plain
225
+ * attribute bag + ingest hostname, without constructing a ReadableSpan. The
226
+ * `ingestHost` should be a bare hostname (lower-case, no port), matching what
227
+ * the exporter derives via `safeHostname(cfg.url)`.
228
+ */
229
+ export function isSelfSpanForTest(attrs, ingestHost) {
230
+ return isSelfSpanAttrs(attrs, ingestHost);
231
+ }
232
+ // ---------------------------------------------------------------------------
108
233
  // Exporter
109
234
  // ---------------------------------------------------------------------------
110
235
  /**
@@ -113,6 +238,8 @@ export function serializeSpans(spans) {
113
238
  */
114
239
  export class HeystackSpanExporter {
115
240
  url;
241
+ /** Hostname (no port) of the ingest endpoint, used to drop self-trace spans. */
242
+ ingestHost;
116
243
  headers;
117
244
  shutdownState = false;
118
245
  /**
@@ -123,9 +250,20 @@ export class HeystackSpanExporter {
123
250
  * dropped on fast-responding Workers/edge handlers.
124
251
  */
125
252
  pending = new Set();
253
+ /**
254
+ * Optional `waitUntil` hook. When set, each in-flight export `fetch` is also
255
+ * handed to it so the runtime keeps the isolate alive until the POST
256
+ * completes — without any per-request hook in app code. This is the reliable
257
+ * delivery path on Next-on-OpenNext, where there is no `ExecutionContext`
258
+ * passed to `registerHeystack` but the export DOES run inside a Cloudflare
259
+ * request context whose `ctx.waitUntil` we can borrow. Falls back silently to
260
+ * the `pending` set + manual `flushHeystack()` when absent/unavailable.
261
+ */
262
+ waitUntil;
126
263
  constructor(options) {
127
264
  const cfg = buildExporterConfig(options);
128
265
  this.url = cfg.url;
266
+ this.ingestHost = safeHostname(cfg.url);
129
267
  this.headers = {
130
268
  ...cfg.headers,
131
269
  "content-type": "application/json",
@@ -143,11 +281,25 @@ export class HeystackSpanExporter {
143
281
  resultCallback({ code: ExportResultCode.SUCCESS });
144
282
  return;
145
283
  }
146
- const body = JSON.stringify(serializeSpans(spans));
284
+ // Belt-and-suspenders: drop the exporter's own self-trace spans (any span
285
+ // targeting the ingest origin) so an upstream instrumentation that ignores
286
+ // our suppressed context still can't feed the feedback loop.
287
+ const exportable = spans.filter((s) => !isSelfSpan(s, this.ingestHost));
288
+ if (exportable.length === 0) {
289
+ resultCallback({ code: ExportResultCode.SUCCESS });
290
+ return;
291
+ }
292
+ const body = JSON.stringify(serializeSpans(exportable));
147
293
  // Build the fetch chain as a promise we retain, so forceFlush() can await
148
294
  // the actual network write. It resolves (never rejects) once the POST has
149
295
  // completed (success or fail) and resultCallback has been invoked.
150
- const p = fetch(this.url, { method: "POST", headers: this.headers, body })
296
+ //
297
+ // The POST runs inside a tracing-suppressed context so that host fetch
298
+ // auto-instrumentation (e.g. Next/OpenNext) does NOT create a CLIENT span
299
+ // for it — which would otherwise be exported and re-captured, a sustained
300
+ // feedback loop.
301
+ const p = context
302
+ .with(suppressTracing(context.active()), () => fetch(this.url, { method: "POST", headers: this.headers, body }))
151
303
  .then((res) => {
152
304
  if (res.ok) {
153
305
  resultCallback({ code: ExportResultCode.SUCCESS });
@@ -167,6 +319,69 @@ export class HeystackSpanExporter {
167
319
  });
168
320
  this.pending.add(p);
169
321
  p.finally(() => this.pending.delete(p));
322
+ // Keep the isolate alive until the POST completes. Priority:
323
+ // 1. an explicit `waitUntil` override (set via initHeystackWorkers).
324
+ // 2. the OpenNext Cloudflare request context's `ctx.waitUntil` — the
325
+ // export runs during request handling, so this is available there.
326
+ // Either path makes delivery reliable on workerd/OpenNext with no app hook.
327
+ // The `pending` set + `flushHeystack()` remain as the explicit fallback.
328
+ if (this.waitUntil) {
329
+ try {
330
+ this.waitUntil(p);
331
+ }
332
+ catch (error) {
333
+ this.warnWaitUntilFailed("explicit waitUntil", error);
334
+ }
335
+ }
336
+ else {
337
+ // The accessor loads asynchronously, so the first export(s) may run before
338
+ // it resolves. Borrow `ctx.waitUntil` synchronously if it's already
339
+ // available; otherwise await the in-flight accessor load so even those
340
+ // early exports get handed to waitUntil once it resolves. `p` is already
341
+ // in `pending`, so a manual `flushHeystack()` covers it regardless.
342
+ this.handOffToCloudflareContext(p);
343
+ }
344
+ }
345
+ /**
346
+ * Hand `p` to the OpenNext Cloudflare request context's `ctx.waitUntil`.
347
+ * Resolves the accessor first (awaiting its in-flight load if needed) so an
348
+ * export that fired before the dynamic import settled is still covered.
349
+ */
350
+ handOffToCloudflareContext(p) {
351
+ const attempt = () => {
352
+ if (!_getCloudflareContext)
353
+ return;
354
+ try {
355
+ const cf = _getCloudflareContext();
356
+ cf?.ctx?.waitUntil?.(p);
357
+ }
358
+ catch (error) {
359
+ // #15: a real failure inside getCloudflareContext (e.g. called outside a
360
+ // request context) was previously swallowed silently. Name it once so it
361
+ // is diagnosable, then fall back to the `pending` set + manual flush.
362
+ this.warnWaitUntilFailed("getCloudflareContext", error);
363
+ }
364
+ };
365
+ if (_getCloudflareContext) {
366
+ attempt();
367
+ }
368
+ else {
369
+ // Accessor not resolved yet — await its load, then try once.
370
+ void loadCloudflareContextAccessor().then(attempt);
371
+ }
372
+ }
373
+ warnedWaitUntilFailure = false;
374
+ /** Log a waitUntil/accessor failure exactly once; never throw. */
375
+ warnWaitUntilFailed(where, error) {
376
+ if (this.warnedWaitUntilFailure)
377
+ return;
378
+ this.warnedWaitUntilFailure = true;
379
+ const msg = error instanceof Error ? error.message : String(error);
380
+ // console.debug so it's quiet by default but visible when debugging dropped
381
+ // traces; the export still completes via the `pending` set + manual flush.
382
+ console.debug(`[heystack] could not hand export to ${where} (${msg}); ` +
383
+ "falling back to pending-set flush. Trace delivery may be best-effort " +
384
+ "unless you call flushHeystack() or pass an explicit waitUntil.");
170
385
  }
171
386
  shutdown() {
172
387
  return this.forceFlush().then(() => {
@@ -206,27 +421,125 @@ export function createTracerProvider(config) {
206
421
  // Global tracer provider registration (for host frameworks, e.g. Next.js)
207
422
  // ---------------------------------------------------------------------------
208
423
  let _provider = null;
424
+ // ---------------------------------------------------------------------------
425
+ // Context manager registration (makes suppressTracing() actually work)
426
+ //
427
+ // `context.with(...)` is a NO-OP unless a ContextManager is registered with the
428
+ // global OTel API. Without one, `suppressTracing(context.active())` produces a
429
+ // context that is never made active, so the exporter's POST is NOT suppressed
430
+ // in production and host fetch auto-instrumentation can re-trace it (feedback
431
+ // loop). We therefore register a manager exactly ONCE in `ensureGlobalProvider`.
432
+ //
433
+ // We register a dependency-free SYNCHRONOUS stack manager (below). Deliberately
434
+ // NOT AsyncLocalStorageContextManager: that statically imports `node:async_hooks`,
435
+ // which would break `import "@heystack/otel/workers"` on a bare workerd without
436
+ // `nodejs_compat` (and on other WinterCG runtimes) — defeating the whole point of
437
+ // this entry being node-builtin-free. The sync manager covers the critical path:
438
+ // the exporter's POST runs synchronously inside the suppressed `context.with`, so
439
+ // `suppressTracing` takes effect. Trade-off: no cross-`await` context propagation,
440
+ // so deep nested-span parenting is limited on the edge (documented).
441
+ // ---------------------------------------------------------------------------
209
442
  /**
210
- * Register Heystack as the global tracer provider on a Workers/edge runtime
211
- * (workerd). Spans from the host framework (e.g. Next.js) export over fetch.
212
- * Use this instead of @heystack/otel/node when running on Cloudflare/edge.
443
+ * A minimal SYNCHRONOUS, stack-based ContextManager the registered manager for
444
+ * the /workers entry (no `node:async_hooks`, so it works on any WinterCG runtime).
445
+ * It makes `context.with()` propagate synchronously, which is enough for the
446
+ * exporter's `suppressTracing` to take effect and for the belt-and-suspenders
447
+ * self-span filter — but it does NOT carry context across `await` boundaries (so
448
+ * cross-`await` parent linking and per-request isolation are best-effort).
449
+ */
450
+ export class SyncStackContextManager {
451
+ _stack = [];
452
+ active() {
453
+ return this._stack[this._stack.length - 1] ?? ROOT_CONTEXT;
454
+ }
455
+ with(ctx, fn, thisArg, ...args) {
456
+ this._stack.push(ctx);
457
+ try {
458
+ return fn.call(thisArg, ...args);
459
+ }
460
+ finally {
461
+ this._stack.pop();
462
+ }
463
+ }
464
+ bind(_ctx, target) {
465
+ return target;
466
+ }
467
+ enable() {
468
+ return this;
469
+ }
470
+ disable() {
471
+ this._stack = [];
472
+ return this;
473
+ }
474
+ }
475
+ let _contextManagerRegistered = false;
476
+ /**
477
+ * Register a global OTel ContextManager exactly once, so that
478
+ * `context.with(suppressTracing(...))` in the exporter is actually honoured —
479
+ * otherwise suppression is a no-op and the exporter's POST can be re-traced.
213
480
  *
214
- * FLUSH: there is no per-request `ExecutionContext` on this path, so spans are
215
- * NOT auto-flushed. To guarantee delivery on workerd, call `flushHeystack()`
216
- * (which awaits the export fetch) from a response hook — or hand it to
217
- * `ctx.waitUntil(flushHeystack())` if you do have a ctx. Returns the provider.
481
+ * We register a synchronous, dependency-free stack manager. This keeps the
482
+ * /workers entry WinterCG-safe (no `node:async_hooks` import works on bare
483
+ * workerd WITHOUT nodejs_compat, Deno, Bun, etc.). It fully covers the critical
484
+ * path (the export fetch runs synchronously inside the suppressed `context.with`,
485
+ * and per-request root spans). Trade-off: it does not propagate context across
486
+ * `await` boundaries, so deep nested-span parenting is limited on the edge — an
487
+ * acceptable, documented limitation (workerd has no async context manager by
488
+ * default regardless).
218
489
  */
219
- export function initHeystackWorkers(config) {
490
+ function ensureContextManager() {
491
+ if (_contextManagerRegistered)
492
+ return;
493
+ _contextManagerRegistered = true;
494
+ context.setGlobalContextManager(new SyncStackContextManager().enable());
495
+ }
496
+ /** Reset the context-manager registration guard. Internal/testing helper. */
497
+ export function __resetContextManager() {
498
+ _contextManagerRegistered = false;
499
+ }
500
+ /**
501
+ * Build (once) and register the singleton global tracer provider. Wires the
502
+ * exporter's `waitUntil` (explicit override > nothing here; the auto-detected
503
+ * OpenNext context is resolved lazily inside the exporter) and kicks off the
504
+ * best-effort load of OpenNext's `getCloudflareContext` accessor so the
505
+ * exporter can borrow `ctx.waitUntil` during request handling. Shared by
506
+ * `initHeystackWorkers` and `instrument`.
507
+ */
508
+ function ensureGlobalProvider(config) {
220
509
  if (_provider)
221
510
  return _provider;
222
511
  _provider = createTracerProvider(config);
223
- // register as the global provider so framework spans flow to our exporter.
224
- // BasicTracerProvider.register() also wires context/propagation, but on
225
- // workerd we only need the global tracer provider set it directly so it's
226
- // deterministic and doesn't pull in a context manager that may not run here.
512
+ if (config.waitUntil)
513
+ _provider.heystackExporter.waitUntil = config.waitUntil;
514
+ // Best-effort: resolve OpenNext's context accessor so the exporter can borrow
515
+ // `ctx.waitUntil` during request handling. Guarded; fails closed when absent.
516
+ void loadCloudflareContextAccessor();
517
+ // Register as the global provider so framework / global-API spans flow to our
518
+ // exporter. We set it directly (rather than provider.register()) so it's
519
+ // deterministic and so we control exactly which ContextManager is registered.
227
520
  trace.setGlobalTracerProvider(_provider);
521
+ // Register a ContextManager (once) so `context.with(suppressTracing(...))` in
522
+ // the exporter actually takes effect — without one, `context.with` is a no-op
523
+ // and suppression silently does nothing in production.
524
+ ensureContextManager();
228
525
  return _provider;
229
526
  }
527
+ /**
528
+ * Register Heystack as the global tracer provider on a Workers/edge runtime
529
+ * (workerd). Spans from the host framework (e.g. Next.js) export over fetch.
530
+ * Use this instead of @heystack/otel/node when running on Cloudflare/edge.
531
+ *
532
+ * FLUSH: on Next-on-OpenNext (where `@opennextjs/cloudflare` is present) the
533
+ * exporter automatically borrows the Cloudflare request context's
534
+ * `ctx.waitUntil` so the export POST completes before the isolate is torn down
535
+ * — no app hook needed. For other workerd setups without it, call
536
+ * `flushHeystack()` (which awaits the export fetch) from a response hook, hand
537
+ * it to `ctx.waitUntil(flushHeystack())`, or pass an explicit `waitUntil` in
538
+ * the config (highest priority). Returns the provider.
539
+ */
540
+ export function initHeystackWorkers(config) {
541
+ return ensureGlobalProvider(config);
542
+ }
230
543
  /**
231
544
  * Force-flush any pending spans AND wait for the export network write to
232
545
  * complete. This awaits both the OTel provider's `forceFlush()` (drains the
@@ -260,40 +573,60 @@ function warnOnceNoKey() {
260
573
  export function __resetWarnings() {
261
574
  warnedNoKey = false;
262
575
  }
263
- /**
264
- * Wrap a Worker's default export so every request is auto-traced with a SERVER
265
- * span.
266
- *
267
- * FLUSH (CRITICAL on Workers/edge): the export is a `fetch()` POST. After
268
- * `span.end()` we `ctx.waitUntil` a promise that awaits BOTH the provider's
269
- * span processor AND the exporter's in-flight fetch, so the network write
270
- * completes before the isolate is torn down. Without this, fast-responding
271
- * handlers return before the POST finishes and the trace is silently dropped.
272
- *
273
- * import { instrument } from "@heystack/otel/workers";
274
- * export default instrument(
275
- * { async fetch(req, env, ctx) { return new Response("ok"); } },
276
- * { service: "my-worker" },
277
- * );
278
- *
279
- * If no API key is available (neither `config.apiKey` nor
280
- * `env.HEYSTACK_API_KEY`), the handler runs untraced.
281
- */
282
576
  export function instrument(handler, config) {
283
- return {
284
- async fetch(req, env, ctx) {
285
- const apiKey = config.apiKey ?? env?.HEYSTACK_API_KEY;
286
- // No key run untraced (warn once).
287
- if (!apiKey) {
288
- warnOnceNoKey();
289
- return handler.fetch(req, env, ctx);
290
- }
291
- const provider = createTracerProvider({
292
- apiKey,
293
- service: config.service,
294
- endpoint: config.endpoint,
295
- });
296
- const tracer = provider.getTracer("@heystack/otel/workers");
577
+ // Resolve the API key + set up (once) the global provider. Returns null when
578
+ // no key is available so callers can run the handler untraced.
579
+ const setup = (env) => {
580
+ const apiKey = config.apiKey ?? env?.HEYSTACK_API_KEY;
581
+ if (!apiKey) {
582
+ warnOnceNoKey();
583
+ return null;
584
+ }
585
+ // The global provider lets spans created via the global `trace.getTracer()`
586
+ // API — nested framework/library/manual spans — export too, yielding a
587
+ // trace tree rather than a lone root span.
588
+ const provider = ensureGlobalProvider({
589
+ apiKey,
590
+ service: config.service,
591
+ endpoint: config.endpoint,
592
+ waitUntil: config.waitUntil,
593
+ });
594
+ return { provider, tracer: trace.getTracer("heystack") };
595
+ };
596
+ // Drain BOTH the provider's span processor AND the exporter's in-flight fetch
597
+ // via ctx.waitUntil. Awaiting only provider.forceFlush() would return before
598
+ // the export POST completes, letting the isolate be torn down and silently
599
+ // dropping the trace.
600
+ //
601
+ // #25: the drain is raced against an ~8s timeout so a hung forceFlush (e.g. an
602
+ // ingest that never responds) can't pin the isolate until the platform CPU
603
+ // timeout kills it. On timeout we stop waiting; the POST is still in-flight and
604
+ // may yet complete, we just don't block the isolate on it indefinitely.
605
+ const DRAIN_TIMEOUT_MS = 8_000;
606
+ const drain = (provider, ctx) => {
607
+ const drained = (async () => {
608
+ await provider.forceFlush().catch(() => { });
609
+ await provider.heystackExporter.forceFlush().catch(() => { });
610
+ })();
611
+ let timer;
612
+ const timeout = new Promise((resolve) => {
613
+ timer = setTimeout(resolve, DRAIN_TIMEOUT_MS);
614
+ });
615
+ ctx.waitUntil(Promise.race([drained, timeout]).finally(() => {
616
+ if (timer)
617
+ clearTimeout(timer);
618
+ }));
619
+ };
620
+ // Start from a shallow copy so EVERY sibling handler (tail, email, …) is
621
+ // forwarded untouched; we only override fetch/queue/scheduled below.
622
+ const wrapped = { ...handler };
623
+ const originalFetch = handler.fetch?.bind(handler);
624
+ if (originalFetch) {
625
+ wrapped.fetch = async (req, env, ctx) => {
626
+ const s = setup(env);
627
+ if (!s)
628
+ return originalFetch(req, env, ctx);
629
+ const { provider, tracer } = s;
297
630
  const url = new URL(req.url);
298
631
  const span = tracer.startSpan(`${req.method} ${url.pathname}`, {
299
632
  kind: SpanKind.SERVER,
@@ -304,35 +637,90 @@ export function instrument(handler, config) {
304
637
  "server.address": url.host,
305
638
  },
306
639
  });
307
- // waitUntil a promise that drains BOTH the provider's span processor and
308
- // the exporter's in-flight fetch. Awaiting only provider.forceFlush()
309
- // would return before the export POST completes, letting the isolate be
310
- // torn down and silently dropping the trace.
311
- const flush = () => ctx.waitUntil((async () => {
312
- await provider.forceFlush().catch(() => { });
313
- await provider.heystackExporter.forceFlush().catch(() => { });
314
- })());
315
640
  try {
316
- const response = await context.with(trace.setSpan(context.active(), span), () => handler.fetch(req, env, ctx));
641
+ const response = await context.with(trace.setSpan(context.active(), span), () => originalFetch(req, env, ctx));
317
642
  span.setAttribute("http.response.status_code", response.status);
318
643
  span.setStatus({
319
644
  code: response.status >= 500 ? SpanStatusCode.ERROR : SpanStatusCode.UNSET,
320
645
  });
321
646
  span.end();
322
- flush();
647
+ drain(provider, ctx);
323
648
  return response;
324
649
  }
325
650
  catch (error) {
326
- if (error instanceof Error)
327
- span.recordException(error);
651
+ span.recordException(error instanceof Error ? error : new Error(String(error)));
328
652
  span.setStatus({
329
653
  code: SpanStatusCode.ERROR,
330
654
  message: error instanceof Error ? error.message : String(error),
331
655
  });
332
656
  span.end();
333
- flush();
657
+ drain(provider, ctx);
334
658
  throw error;
335
659
  }
336
- },
337
- };
660
+ };
661
+ }
662
+ const originalQueue = handler.queue?.bind(handler);
663
+ if (originalQueue) {
664
+ wrapped.queue = async (batch, env, ctx) => {
665
+ const s = setup(env);
666
+ if (!s)
667
+ return originalQueue(batch, env, ctx);
668
+ const { provider, tracer } = s;
669
+ const span = tracer.startSpan(`queue ${batch.queue}`, {
670
+ kind: SpanKind.CONSUMER,
671
+ attributes: {
672
+ "messaging.batch.message_count": batch.messages.length,
673
+ "messaging.destination.name": batch.queue,
674
+ },
675
+ });
676
+ try {
677
+ await context.with(trace.setSpan(context.active(), span), () => originalQueue(batch, env, ctx));
678
+ span.setStatus({ code: SpanStatusCode.UNSET });
679
+ span.end();
680
+ drain(provider, ctx);
681
+ }
682
+ catch (error) {
683
+ span.recordException(error instanceof Error ? error : new Error(String(error)));
684
+ span.setStatus({
685
+ code: SpanStatusCode.ERROR,
686
+ message: error instanceof Error ? error.message : String(error),
687
+ });
688
+ span.end();
689
+ drain(provider, ctx);
690
+ throw error;
691
+ }
692
+ };
693
+ }
694
+ const originalScheduled = handler.scheduled?.bind(handler);
695
+ if (originalScheduled) {
696
+ wrapped.scheduled = async (controller, env, ctx) => {
697
+ const s = setup(env);
698
+ if (!s)
699
+ return originalScheduled(controller, env, ctx);
700
+ const { provider, tracer } = s;
701
+ const span = tracer.startSpan(`scheduled ${controller.cron}`, {
702
+ kind: SpanKind.INTERNAL,
703
+ attributes: {
704
+ "controller.cron": controller.cron,
705
+ },
706
+ });
707
+ try {
708
+ await context.with(trace.setSpan(context.active(), span), () => originalScheduled(controller, env, ctx));
709
+ span.setStatus({ code: SpanStatusCode.UNSET });
710
+ span.end();
711
+ drain(provider, ctx);
712
+ }
713
+ catch (error) {
714
+ span.recordException(error instanceof Error ? error : new Error(String(error)));
715
+ span.setStatus({
716
+ code: SpanStatusCode.ERROR,
717
+ message: error instanceof Error ? error.message : String(error),
718
+ });
719
+ span.end();
720
+ drain(provider, ctx);
721
+ throw error;
722
+ }
723
+ };
724
+ }
725
+ return wrapped;
338
726
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@heystack/otel",
3
- "version": "0.2.1",
3
+ "version": "0.3.2",
4
4
  "description": "Runtime-aware OpenTelemetry tracing that exports to Heystack (Node, Next.js, Workers).",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -22,6 +22,7 @@
22
22
  },
23
23
  "dependencies": {
24
24
  "@opentelemetry/api": "^1.9.0",
25
+ "@opentelemetry/core": "^1.30.0",
25
26
  "@opentelemetry/sdk-node": "^0.57.0",
26
27
  "@opentelemetry/exporter-trace-otlp-http": "^0.57.0",
27
28
  "@opentelemetry/auto-instrumentations-node": "^0.55.0",