tunnel-mcp 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -11,6 +11,50 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
11
11
 
12
12
  - Nothing yet.
13
13
 
14
+ ## [0.1.4] - 2026-07-01
15
+
16
+ ### Fixed
17
+
18
+ - **Guests no longer fail to join with `getaddrinfo ENOTFOUND …trycloudflare.com`.**
19
+ Root cause: a cloudflared quick tunnel prints its URL ~8–25s before the
20
+ per-tunnel DNS record propagates, and the old host-side reachability probe
21
+ looked the name up immediately — seeding an `NXDOMAIN` that the resolver
22
+ negative-cached for up to 30 minutes (the zone's SOA minimum), breaking the
23
+ guest's join even after the tunnel went live. The probe was the cause, not a
24
+ diagnostic.
25
+
26
+ ### Changed
27
+
28
+ - **`tunnel_open` now gates the join link on real DNS readiness via DoH.** After
29
+ cloudflared reports the URL, the host polls liveness over IP-literal DoH
30
+ endpoints (`1.1.1.1`/`1.0.0.1`/`8.8.8.8`) — which never touch, and so never
31
+ poison, the system resolver — and only returns the link once the record
32
+ resolves (best-effort: it never blocks or hard-fails; after a budget it returns
33
+ the link anyway).
34
+ - **The guest resolves system-first with a DoH fallback**, connecting by the
35
+ resolved IP while keeping SNI/Host = the hostname, so a guest whose resolver
36
+ lags or holds a stale negative cache still connects.
37
+ - **Guest connection is now time-bounded** (handshake + overall connect deadline),
38
+ so a black-hole link fails fast with a clear error instead of hanging.
39
+ - The `0.1.3` `TUNNEL_REACHABILITY` (and `0.1.2` `TUNNEL_SKIP_REACHABILITY_CHECK`)
40
+ environment variables are **no longer read** — they only ever relaxed the
41
+ now-deleted probe. New single knob: `TUNNEL_DOH=off` disables DoH for networks
42
+ that block it and where system DNS already works.
43
+
44
+ ## [0.1.3] - 2026-07-01
45
+
46
+ ### Changed
47
+
48
+ - **`tunnel_open` no longer hard-fails when the host can't reach
49
+ `*.trycloudflare.com`.** Because this is a cross-network tool, only the guest's
50
+ network has to reach the link — so a host-side reachability-probe failure
51
+ (blocked DNS, or a proxy Node's `fetch` ignores) now **opens the tunnel anyway
52
+ and returns a `reachabilityWarning`** by default, instead of blocking a tunnel
53
+ that would have worked for the guest. Behavior is configurable via
54
+ `TUNNEL_REACHABILITY`: `warn` (default), `strict` (previous hard-fail), or
55
+ `off` (skip the probe). This replaces the `TUNNEL_SKIP_REACHABILITY_CHECK` flag
56
+ from 0.1.2, which is still honored as `off` for backward compatibility.
57
+
14
58
  ## [0.1.2] - 2026-07-01
15
59
 
16
60
  ### Added
@@ -80,7 +124,9 @@ install-skill` copies the `tunnel-etiquette` skill into `~/.claude/skills`
80
124
  declaring a fix "confirmed".
81
125
  - Test suite of 109 tests built with vitest, developed test-first (TDD).
82
126
 
83
- [Unreleased]: https://github.com/zachlikefolio/tunnel-mcp/compare/v0.1.2...HEAD
127
+ [Unreleased]: https://github.com/zachlikefolio/tunnel-mcp/compare/v0.1.4...HEAD
128
+ [0.1.4]: https://github.com/zachlikefolio/tunnel-mcp/compare/v0.1.3...v0.1.4
129
+ [0.1.3]: https://github.com/zachlikefolio/tunnel-mcp/compare/v0.1.2...v0.1.3
84
130
  [0.1.2]: https://github.com/zachlikefolio/tunnel-mcp/compare/v0.1.1...v0.1.2
85
131
  [0.1.1]: https://github.com/zachlikefolio/tunnel-mcp/compare/v0.1.0...v0.1.1
86
132
  [0.1.0]: https://github.com/zachlikefolio/tunnel-mcp/releases/tag/v0.1.0
package/README.md CHANGED
@@ -162,7 +162,7 @@ vulnerability.
162
162
 
163
163
  ```bash
164
164
  npm ci # install dependencies
165
- npm test # run the test suite (136 tests, TDD)
165
+ npm test # run the test suite (159 tests, TDD)
166
166
  npm run build # compile TypeScript
167
167
  npm run lint # eslint
168
168
  npm run format:check # prettier --check .
@@ -178,15 +178,18 @@ an interactive CLI — with no arguments it starts and waits for an MCP client t
178
178
  connect over stdin/stdout. That's working as intended. Register it with a client
179
179
  (above), or run `tunnel-mcp --help`.
180
180
 
181
- **`tunnel_open` fails with "never became reachable" / can't resolve
182
- `*.trycloudflare.com`.** cloudflared reaches Cloudflare's edge over its own
183
- protocol, but the public `*.trycloudflare.com` hostname still has to resolve via
184
- normal DNS and some networks (corporate/filtered networks, and a few public
185
- DNS resolvers) block `trycloudflare.com`. Both you **and your guest** need to be
186
- able to resolve it. Check with `dig +short <random>.trycloudflare.com` or
187
- `curl -sI https://<the-url>`. If only your guest's network needs to reach the
188
- URL, set `TUNNEL_SKIP_REACHABILITY_CHECK=1` to open the tunnel without the
189
- host-side reachability probe.
181
+ **Guest join fails with `getaddrinfo ENOTFOUND …trycloudflare.com`.** A
182
+ cloudflared quick tunnel prints its URL a few seconds _before_ the per-tunnel DNS
183
+ record has propagated. If anything looks the name up too early it gets an
184
+ `NXDOMAIN` that the resolver negative-caches for up to 30 minutes breaking the
185
+ join even after the tunnel is live. `tunnel-mcp` avoids this: `tunnel_open` waits
186
+ for the record to actually resolve (via DoH to Cloudflare's `1.1.1.1`, an IP that
187
+ never touches and so never poisons — your system resolver) before returning the
188
+ link, and the guest resolves system-first with a DoH fallback. So a fresh join
189
+ should just work; if you hit `ENOTFOUND`, an _earlier_ attempt likely poisoned the
190
+ cache — wait for it to expire, or flush DNS (`sudo dscacheutil -flushcache` on
191
+ macOS). Set `TUNNEL_DOH=off` only on networks that block DoH (`1.1.1.1`) and where
192
+ system DNS already resolves `*.trycloudflare.com`.
190
193
 
191
194
  ## Roadmap / not yet supported
192
195
 
package/dist/cli.js CHANGED
@@ -60,10 +60,12 @@ export function helpText(version = readVersion()) {
60
60
  ` --force, -f Overwrite an existing install`,
61
61
  ``,
62
62
  `Environment:`,
63
- ` TUNNEL_SKILLS_DIR Override the skills directory`,
64
- ` TUNNEL_SKIP_SKILL_INSTALL=1 Skip the automatic skill install on npm install`,
65
- ` TUNNEL_SKIP_REACHABILITY_CHECK=1 Open a tunnel even if this machine can't reach`,
66
- ` *.trycloudflare.com (your guest still must)`,
63
+ ` TUNNEL_SKILLS_DIR Override the skills directory`,
64
+ ` TUNNEL_SKIP_SKILL_INSTALL=1 Skip the automatic skill install on npm install`,
65
+ ` TUNNEL_DOH=off Disable DoH used to confirm the tunnel hostname is`,
66
+ ` live before sharing the link, and as the guest's`,
67
+ ` resolver fallback. Only disable on networks that block`,
68
+ ` DoH (1.1.1.1) and where system DNS already works.`,
67
69
  ``,
68
70
  `Docs: https://github.com/zachlikefolio/tunnel-mcp`,
69
71
  ].join('\n');
@@ -1,3 +1,4 @@
1
+ import { DohResult } from '../net/doh.js';
1
2
  export interface TunnelHandle {
2
3
  publicUrl: string;
3
4
  stop(): void;
@@ -5,20 +6,25 @@ export interface TunnelHandle {
5
6
  export interface StartOptions {
6
7
  timeoutMs?: number;
7
8
  extraArgs?: string[];
8
- attempts?: number;
9
- intervalMs?: number;
10
- healthCheck?: (url: string) => Promise<boolean>;
11
- probeTimeoutMs?: number;
12
- skipHealthCheck?: boolean;
9
+ budgetMs?: number;
10
+ pollIntervalMs?: number;
11
+ initialDelayMs?: number;
12
+ resolveHost?: (host: string) => Promise<DohResult>;
13
+ dohEnabled?: boolean;
13
14
  }
14
15
  export declare function parsePublicUrl(line: string): string | null;
15
- export declare function describeProbeError(e: unknown): string;
16
- export declare function defaultHealthCheck(url: string): Promise<boolean>;
17
- export declare function unreachableMessage(url: string, attempts: number, lastReason?: string): string;
18
16
  /**
19
- * `extraArgs` exists for tests: it lets a fake binary (e.g. `node fake.mjs`) be
20
- * launched in place of `cloudflared tunnel --url ...`. Production passes none.
21
- * The URL is surfaced only after a health probe confirms the edge is reachable
22
- * (cloudflared prints the hostname before routing is live).
17
+ * Spawn `cloudflared tunnel --url ...` and resolve with the public URL but
18
+ * only after a readiness gate confirms the per-tunnel DNS record has propagated.
19
+ *
20
+ * cloudflared prints the URL ~8–25s before the record exists. Looking the name
21
+ * up via the system resolver during that window returns NXDOMAIN and gets it
22
+ * negative-cached (SOA min 1800s), breaking the guest's join for up to 30 min.
23
+ * So the gate polls DoH over IP-literal endpoints (which never touch the system
24
+ * resolver, so they cannot poison anything). It is best-effort: it never blocks
25
+ * on "not live yet" or "DoH unavailable" — after the budget it returns the link
26
+ * optimistically (the guest's own DoH fallback is the safety net).
27
+ *
28
+ * `extraArgs` exists for tests: it launches a fake binary in place of cloudflared.
23
29
  */
24
30
  export declare function startCloudflared(binPath: string, localPort: number, opts?: StartOptions): Promise<TunnelHandle>;
@@ -1,130 +1,55 @@
1
1
  import { spawn } from 'node:child_process';
2
- import { CLOUDFLARED_URL_TIMEOUT_MS, CLOUDFLARED_HEALTH_ATTEMPTS, CLOUDFLARED_HEALTH_INTERVAL_MS, } from '../config.js';
2
+ import { CLOUDFLARED_URL_TIMEOUT_MS, READINESS_GATE_BUDGET_MS, READINESS_INITIAL_DELAY_MS, READINESS_POLL_INTERVAL_MS, } from '../config.js';
3
+ import { dohResolve } from '../net/doh.js';
4
+ import { envFlag } from '../env.js';
3
5
  const URL_RE = /https:\/\/[a-z0-9-]+\.trycloudflare\.com/;
4
- // Bounds a single probe so a black-hole connection (or a caller-supplied
5
- // healthCheck that hangs/throws) can't stall the health-check loop forever.
6
- const DEFAULT_PROBE_TIMEOUT_MS = 5000;
6
+ const delay = (ms) => new Promise((r) => setTimeout(r, ms));
7
7
  export function parsePublicUrl(line) {
8
8
  const m = line.match(URL_RE);
9
9
  return m ? m[0] : null;
10
10
  }
11
- // undici (Node's global fetch) reports the real network error on `.cause`, e.g.
12
- // a DNS failure surfaces as `cause.code === 'ENOTFOUND'`. Pull that out so the
13
- // caller can tell "DNS can't resolve the host" apart from "edge not ready yet".
14
- export function describeProbeError(e) {
15
- const err = e;
16
- const code = err?.cause?.code;
17
- if (code)
18
- return err.cause?.message ? `${code}: ${err.cause.message}` : code;
19
- if (err?.name === 'TimeoutError')
20
- return 'probe timed out';
21
- return err?.message || 'unknown error';
22
- }
23
- // Any HTTP response (even 404/502/426) means the Cloudflare edge is routing to
24
- // us. A thrown error carries why it failed (DNS, TLS, refused, timeout).
25
- async function reachabilityProbe(url, probeTimeoutMs) {
26
- try {
27
- await fetch(url, { method: 'GET', signal: AbortSignal.timeout(probeTimeoutMs) });
28
- return { ok: true };
29
- }
30
- catch (e) {
31
- return { ok: false, reason: describeProbeError(e) };
32
- }
33
- }
34
- // Back-compat boolean probe (kept for external callers/tests).
35
- export async function defaultHealthCheck(url) {
36
- return (await reachabilityProbe(url, DEFAULT_PROBE_TIMEOUT_MS)).ok;
37
- }
38
- // Builds the actionable "never became reachable" error. Names the host and,
39
- // when the failure looks like DNS resolution, points at *.trycloudflare.com
40
- // blocking — the single most common real-world cause. Always mentions the
41
- // escape hatch, since the guest's network (not the host's) is what must reach
42
- // the URL for messaging.
43
- export function unreachableMessage(url, attempts, lastReason) {
44
- let host = url;
45
- try {
46
- host = new URL(url).host;
47
- }
48
- catch {
49
- /* keep the raw url */
50
- }
51
- const dnsish = !!lastReason && /ENOTFOUND|EAI_AGAIN|getaddrinfo|\bdns\b/i.test(lastReason);
52
- let msg = `cloudflared reported ${url} but it never became reachable from this machine after ${attempts} probe(s)`;
53
- if (lastReason)
54
- msg += ` (last error: ${lastReason})`;
55
- msg += '.';
56
- if (dnsish) {
57
- msg +=
58
- ` This machine can't resolve ${host} — your DNS or network may be blocking *.trycloudflare.com` +
59
- ` (common on filtered/corporate networks and some public DNS resolvers). You and your guest both` +
60
- ` need to be able to resolve it.`;
61
- }
62
- msg += ` If you're confident your guest's network can reach the URL, set TUNNEL_SKIP_REACHABILITY_CHECK=1 to open the tunnel anyway.`;
63
- return msg;
64
- }
65
- // Races a single probe against a per-attempt timeout so that a caller-supplied
66
- // `check` that throws, rejects, or simply never resolves can never leave the
67
- // loop (and therefore the outer startCloudflared promise) hanging.
68
- function probeOnce(url, probe, probeTimeoutMs) {
69
- return new Promise((resolve) => {
70
- let settled = false;
71
- const finish = (r) => {
72
- if (!settled) {
73
- settled = true;
74
- resolve(r);
75
- }
76
- };
77
- const timer = setTimeout(() => finish({ ok: false, reason: 'probe timed out' }), probeTimeoutMs);
78
- Promise.resolve()
79
- .then(() => probe(url))
80
- .then((r) => {
81
- clearTimeout(timer);
82
- finish(r);
83
- })
84
- .catch((err) => {
85
- clearTimeout(timer);
86
- finish({ ok: false, reason: describeProbeError(err) });
87
- });
88
- });
89
- }
90
- async function waitHealthy(url, attempts, intervalMs, probe, probeTimeoutMs) {
91
- let lastReason;
92
- for (let i = 0; i < attempts; i++) {
93
- const r = await probeOnce(url, probe, probeTimeoutMs);
94
- if (r.ok)
95
- return r;
96
- lastReason = r.reason;
97
- await new Promise((res) => setTimeout(res, intervalMs));
98
- }
99
- return { ok: false, reason: lastReason };
11
+ function dohOn(explicit) {
12
+ return explicit ?? (process.env.TUNNEL_DOH === undefined || envFlag('TUNNEL_DOH'));
100
13
  }
101
14
  /**
102
- * `extraArgs` exists for tests: it lets a fake binary (e.g. `node fake.mjs`) be
103
- * launched in place of `cloudflared tunnel --url ...`. Production passes none.
104
- * The URL is surfaced only after a health probe confirms the edge is reachable
105
- * (cloudflared prints the hostname before routing is live).
15
+ * Spawn `cloudflared tunnel --url ...` and resolve with the public URL but
16
+ * only after a readiness gate confirms the per-tunnel DNS record has propagated.
17
+ *
18
+ * cloudflared prints the URL ~8–25s before the record exists. Looking the name
19
+ * up via the system resolver during that window returns NXDOMAIN and gets it
20
+ * negative-cached (SOA min 1800s), breaking the guest's join for up to 30 min.
21
+ * So the gate polls DoH over IP-literal endpoints (which never touch the system
22
+ * resolver, so they cannot poison anything). It is best-effort: it never blocks
23
+ * on "not live yet" or "DoH unavailable" — after the budget it returns the link
24
+ * optimistically (the guest's own DoH fallback is the safety net).
25
+ *
26
+ * `extraArgs` exists for tests: it launches a fake binary in place of cloudflared.
106
27
  */
107
28
  export function startCloudflared(binPath, localPort, opts = {}) {
108
29
  const args = opts.extraArgs ?? ['tunnel', '--url', `http://localhost:${localPort}`];
109
30
  const timeoutMs = opts.timeoutMs ?? CLOUDFLARED_URL_TIMEOUT_MS;
110
- const attempts = opts.attempts ?? CLOUDFLARED_HEALTH_ATTEMPTS;
111
- const intervalMs = opts.intervalMs ?? CLOUDFLARED_HEALTH_INTERVAL_MS;
112
- const probeTimeoutMs = opts.probeTimeoutMs ?? DEFAULT_PROBE_TIMEOUT_MS;
113
- // A caller-supplied boolean healthCheck carries no failure reason; the default
114
- // probe does. Adapt the former into a ProbeResult either way.
115
- const custom = opts.healthCheck;
116
- const probe = custom
117
- ? async (u) => ({ ok: await custom(u) })
118
- : (u) => reachabilityProbe(u, probeTimeoutMs);
31
+ const budgetMs = opts.budgetMs ?? READINESS_GATE_BUDGET_MS;
32
+ const pollIntervalMs = opts.pollIntervalMs ?? READINESS_POLL_INTERVAL_MS;
33
+ const initialDelayMs = opts.initialDelayMs ?? READINESS_INITIAL_DELAY_MS;
34
+ const resolveHost = opts.resolveHost ?? ((h) => dohResolve(h, 4));
35
+ const dohEnabled = dohOn(opts.dohEnabled);
119
36
  return new Promise((resolve, reject) => {
120
37
  const child = spawn(binPath, args, { stdio: ['ignore', 'pipe', 'pipe'] });
121
38
  let settled = false;
39
+ let gateStarted = false;
40
+ let exited = false;
122
41
  const stop = () => {
123
42
  try {
124
43
  child.kill('SIGTERM');
125
44
  }
126
45
  catch {
127
- /* gone */
46
+ /* already gone */
47
+ }
48
+ };
49
+ const succeed = (url) => {
50
+ if (!settled) {
51
+ settled = true;
52
+ resolve({ publicUrl: url, stop });
128
53
  }
129
54
  };
130
55
  const fail = (err) => {
@@ -136,36 +61,42 @@ export function startCloudflared(binPath, localPort, opts = {}) {
136
61
  }
137
62
  };
138
63
  const timer = setTimeout(() => fail(new Error('cloudflared did not report a URL in time')), timeoutMs);
64
+ const runGate = async (url, host) => {
65
+ if (!dohEnabled) {
66
+ await delay(pollIntervalMs); // brief settle; the guest's DoH fallback covers readiness
67
+ succeed(url);
68
+ return;
69
+ }
70
+ await delay(initialDelayMs);
71
+ const deadline = Date.now() + budgetMs;
72
+ while (!settled && !exited && Date.now() < deadline) {
73
+ const res = await resolveHost(host).catch(() => ({ klass: 'INDETERMINATE', addresses: [] }));
74
+ if (res.klass === 'RESOLVED') {
75
+ succeed(url);
76
+ return;
77
+ }
78
+ await delay(pollIntervalMs);
79
+ }
80
+ if (settled)
81
+ return;
82
+ if (exited) {
83
+ fail(new Error('cloudflared exited during readiness wait'));
84
+ return;
85
+ }
86
+ // Budget exhausted without a RESOLVED — hand out the link optimistically.
87
+ // No system-DNS lookup ever happened, so nothing was poisoned, and the
88
+ // guest resolves the name itself (system-first, DoH fallback).
89
+ succeed(url);
90
+ };
139
91
  const onData = (buf) => {
92
+ if (gateStarted)
93
+ return;
140
94
  for (const line of buf.toString().split('\n')) {
141
95
  const url = parsePublicUrl(line);
142
- if (url && !settled) {
143
- settled = true;
96
+ if (url) {
97
+ gateStarted = true;
144
98
  clearTimeout(timer);
145
- // Escape hatch: the reachability probe runs on the *host*, but only the
146
- // guest's network must reach the URL for messaging. A host on a filtered
147
- // network can opt to skip the probe and hand out the link regardless.
148
- if (opts.skipHealthCheck) {
149
- resolve({ publicUrl: url, stop });
150
- return;
151
- }
152
- waitHealthy(url, attempts, intervalMs, probe, probeTimeoutMs)
153
- .then((res) => {
154
- if (res.ok)
155
- resolve({ publicUrl: url, stop });
156
- else {
157
- stop();
158
- reject(new Error(unreachableMessage(url, attempts, res.reason)));
159
- }
160
- })
161
- .catch((err) => {
162
- // Should be unreachable (waitHealthy/probeOnce never reject), but
163
- // this guarantees the child is never orphaned and the outer
164
- // promise always settles, even on a future bug or surprise throw.
165
- stop();
166
- const reason = err instanceof Error ? err.message : String(err);
167
- reject(new Error(`cloudflared health check failed unexpectedly: ${reason}`));
168
- });
99
+ void runGate(url, new URL(url).host);
169
100
  return;
170
101
  }
171
102
  }
@@ -173,6 +104,9 @@ export function startCloudflared(binPath, localPort, opts = {}) {
173
104
  child.stdout?.on('data', onData);
174
105
  child.stderr?.on('data', onData);
175
106
  child.on('error', (err) => fail(err));
176
- child.on('exit', (code) => fail(new Error(`cloudflared exited (${code})`)));
107
+ child.on('exit', (code) => {
108
+ exited = true;
109
+ fail(new Error(`cloudflared exited (${code})`));
110
+ });
177
111
  });
178
112
  }
package/dist/config.d.ts CHANGED
@@ -1,3 +1,4 @@
1
+ import type { DohProvider } from './net/doh.js';
1
2
  export declare const TUNNEL_HOME: string;
2
3
  export declare const BIN_DIR: string;
3
4
  export declare const SESSIONS_DIR: string;
@@ -5,6 +6,14 @@ export declare const DEFAULT_LISTEN_TIMEOUT_MS = 60000;
5
6
  export declare const DEFAULT_IDLE_TEARDOWN_MS: number;
6
7
  export declare const DEFAULT_JOIN_LINK_TTL_MS: number;
7
8
  export declare const CLOUDFLARED_URL_TIMEOUT_MS = 30000;
8
- export declare const CLOUDFLARED_HEALTH_ATTEMPTS = 10;
9
- export declare const CLOUDFLARED_HEALTH_INTERVAL_MS = 1000;
10
9
  export declare const OPEN_RETRY_ATTEMPTS = 3;
10
+ export declare const READINESS_GATE_BUDGET_MS = 60000;
11
+ export declare const READINESS_INITIAL_DELAY_MS = 5000;
12
+ export declare const READINESS_POLL_INTERVAL_MS = 1000;
13
+ export declare const DOH_REQUEST_TIMEOUT_MS = 3000;
14
+ export declare const DOH_PROVIDERS: DohProvider[];
15
+ export declare const GUEST_HANDSHAKE_TIMEOUT_MS = 15000;
16
+ export declare const GUEST_CONNECT_DEADLINE_MS = 20000;
17
+ export declare const GUEST_SYS_LOOKUP_TIMEOUT_MS = 2000;
18
+ export declare const DOH_GUEST_RETRIES = 3;
19
+ export declare const DOH_GUEST_RETRY_DELAY_MS = 700;
package/dist/config.js CHANGED
@@ -8,8 +8,41 @@ export const DEFAULT_IDLE_TEARDOWN_MS = 30 * 60_000;
8
8
  // Join links are single-use and expire after this window; a leaked link that
9
9
  // is never used (or is reused after the guest joined) can't admit anyone.
10
10
  export const DEFAULT_JOIN_LINK_TTL_MS = 10 * 60_000;
11
- // cloudflared startup robustness
11
+ // cloudflared startup
12
12
  export const CLOUDFLARED_URL_TIMEOUT_MS = 30_000; // wait for the URL line
13
- export const CLOUDFLARED_HEALTH_ATTEMPTS = 10; // edge-reachability probes
14
- export const CLOUDFLARED_HEALTH_INTERVAL_MS = 1_000; // delay between probes
15
13
  export const OPEN_RETRY_ATTEMPTS = 3; // re-spawn attempts in session.open
14
+ // Host readiness gate. cloudflared prints the quick-tunnel URL before the
15
+ // per-tunnel DNS record has propagated (~8–25s). Any early lookup of the name
16
+ // via the system resolver would be NXDOMAIN and get negative-cached for the
17
+ // zone's SOA minimum (1800s), breaking the guest's join for up to 30 minutes.
18
+ // So we confirm liveness via DoH to IP-literal endpoints (which never touch the
19
+ // system resolver) before handing out the link.
20
+ export const READINESS_GATE_BUDGET_MS = 60_000; // total wait for the record to go live
21
+ export const READINESS_INITIAL_DELAY_MS = 5_000; // delay before the first poll (never faster than ~8s)
22
+ export const READINESS_POLL_INTERVAL_MS = 1_000; // between DoH polls
23
+ // DoH resolver
24
+ export const DOH_REQUEST_TIMEOUT_MS = 3_000; // per-request (measured 40–110ms)
25
+ export const DOH_PROVIDERS = [
26
+ {
27
+ name: 'cloudflare',
28
+ url: (h, t) => `https://1.1.1.1/dns-query?name=${encodeURIComponent(h)}&type=${t}`,
29
+ headers: { accept: 'application/dns-json' },
30
+ },
31
+ {
32
+ name: 'cloudflare2',
33
+ url: (h, t) => `https://1.0.0.1/dns-query?name=${encodeURIComponent(h)}&type=${t}`,
34
+ headers: { accept: 'application/dns-json' },
35
+ },
36
+ // dns.google's cert carries an 8.8.8.8 SAN; the JSON endpoint is /resolve
37
+ // (NOT /dns-query, which expects wire format). IP-literal, so no system DNS.
38
+ {
39
+ name: 'google',
40
+ url: (h, t) => `https://8.8.8.8/resolve?name=${encodeURIComponent(h)}&type=${t}`,
41
+ },
42
+ ];
43
+ // Guest connection bounds (so a black-hole/lagging resolver can't hang the join)
44
+ export const GUEST_HANDSHAKE_TIMEOUT_MS = 15_000; // ws handshake (DNS+TCP+TLS+upgrade)
45
+ export const GUEST_CONNECT_DEADLINE_MS = 20_000; // overall connect+auth deadline (> handshake)
46
+ export const GUEST_SYS_LOOKUP_TIMEOUT_MS = 2_000; // bound the system-first lookup before DoH fallback
47
+ export const DOH_GUEST_RETRIES = 3; // DoH attempts in the guest fallback
48
+ export const DOH_GUEST_RETRY_DELAY_MS = 700; // backoff between guest DoH attempts
@@ -0,0 +1,16 @@
1
+ export type DohClass = 'RESOLVED' | 'NXDOMAIN' | 'INDETERMINATE';
2
+ export interface DohAddress {
3
+ address: string;
4
+ family: 4 | 6;
5
+ }
6
+ export interface DohResult {
7
+ klass: DohClass;
8
+ addresses: DohAddress[];
9
+ }
10
+ export interface DohProvider {
11
+ name: string;
12
+ url: (host: string, type: 'A' | 'AAAA') => string;
13
+ headers?: Record<string, string>;
14
+ }
15
+ export declare function dohQueryOnce(provider: DohProvider, host: string, family: 4 | 6, timeoutMs?: number, fetchImpl?: typeof fetch): Promise<DohResult>;
16
+ export declare function dohResolve(host: string, family: 4 | 6, providers?: DohProvider[], timeoutMs?: number, fetchImpl?: typeof fetch): Promise<DohResult>;
@@ -0,0 +1,52 @@
1
+ import { isIP } from 'node:net';
2
+ import { DOH_PROVIDERS, DOH_REQUEST_TIMEOUT_MS } from '../config.js';
3
+ // Query ONE provider for ONE record type over an IP-literal endpoint (so it can
4
+ // never re-enter the system resolver). Never throws; classifies every failure.
5
+ export async function dohQueryOnce(provider, host, family, timeoutMs = DOH_REQUEST_TIMEOUT_MS, fetchImpl = fetch) {
6
+ const type = family === 6 ? 'AAAA' : 'A';
7
+ const rrType = family === 6 ? 28 : 1;
8
+ try {
9
+ const r = await fetchImpl(provider.url(host, type), {
10
+ headers: { accept: 'application/dns-json', ...(provider.headers ?? {}) },
11
+ signal: AbortSignal.timeout(timeoutMs),
12
+ });
13
+ if (!r.ok)
14
+ return { klass: 'INDETERMINATE', addresses: [] };
15
+ let j;
16
+ try {
17
+ j = (await r.json()); // captive-portal HTML / non-JSON body → catch below
18
+ }
19
+ catch {
20
+ return { klass: 'INDETERMINATE', addresses: [] };
21
+ }
22
+ if (!j || typeof j.Status !== 'number')
23
+ return { klass: 'INDETERMINATE', addresses: [] };
24
+ if (j.Status === 3)
25
+ return { klass: 'NXDOMAIN', addresses: [] }; // not live yet → keep polling
26
+ if (j.Status !== 0)
27
+ return { klass: 'INDETERMINATE', addresses: [] }; // SERVFAIL(2) etc → unreachable-ish
28
+ const answers = Array.isArray(j.Answer) ? j.Answer : [];
29
+ const addresses = answers
30
+ .filter((a) => a.type === rrType && typeof a.data === 'string' && isIP(a.data) === family)
31
+ .map((a) => ({ address: a.data, family }));
32
+ if (!addresses.length)
33
+ return { klass: 'NXDOMAIN', addresses: [] }; // A-less / CNAME-only → not routable yet
34
+ return { klass: 'RESOLVED', addresses };
35
+ }
36
+ catch {
37
+ return { klass: 'INDETERMINATE', addresses: [] }; // refused/timeout/ENETUNREACH/TLS reset
38
+ }
39
+ }
40
+ // Try providers in order; first RESOLVED wins. Fold classes: any NXDOMAIN (and
41
+ // no RESOLVED) → NXDOMAIN; otherwise INDETERMINATE (DoH itself unavailable).
42
+ export async function dohResolve(host, family, providers = DOH_PROVIDERS, timeoutMs = DOH_REQUEST_TIMEOUT_MS, fetchImpl = fetch) {
43
+ let sawNx = false;
44
+ for (const p of providers) {
45
+ const res = await dohQueryOnce(p, host, family, timeoutMs, fetchImpl);
46
+ if (res.klass === 'RESOLVED')
47
+ return res;
48
+ if (res.klass === 'NXDOMAIN')
49
+ sawNx = true;
50
+ }
51
+ return { klass: sawNx ? 'NXDOMAIN' : 'INDETERMINATE', addresses: [] };
52
+ }
@@ -2,13 +2,19 @@ import { EventEmitter } from 'node:events';
2
2
  import { JoinLink } from '../protocol/link.js';
3
3
  import { SessionLog } from '../log/sessionLog.js';
4
4
  import { WireMessage } from '../protocol/messages.js';
5
+ export interface GuestNetOptions {
6
+ handshakeTimeoutMs?: number;
7
+ connectDeadlineMs?: number;
8
+ lookup?: unknown;
9
+ }
5
10
  export declare class GuestClient extends EventEmitter {
6
11
  private link;
7
12
  private guestName;
8
13
  private log;
14
+ private netOpts;
9
15
  private ws?;
10
16
  private pending;
11
- constructor(link: JoinLink, guestName: string, log: SessionLog);
17
+ constructor(link: JoinLink, guestName: string, log: SessionLog, netOpts?: GuestNetOptions);
12
18
  connect(sinceSeq?: number): Promise<{
13
19
  goal: string;
14
20
  peerName: string;
@@ -2,23 +2,61 @@ import { EventEmitter } from 'node:events';
2
2
  import WebSocket from 'ws';
3
3
  import { respondChallenge } from '../protocol/crypto.js';
4
4
  import { encodeFrame, decodeFrame } from '../protocol/messages.js';
5
- import { DEFAULT_LISTEN_TIMEOUT_MS } from '../config.js';
5
+ import { DEFAULT_LISTEN_TIMEOUT_MS, GUEST_HANDSHAKE_TIMEOUT_MS, GUEST_CONNECT_DEADLINE_MS, } from '../config.js';
6
+ import { makeGuestLookup } from './guestLookup.js';
6
7
  export class GuestClient extends EventEmitter {
7
8
  link;
8
9
  guestName;
9
10
  log;
11
+ netOpts;
10
12
  ws;
11
13
  pending = new Map();
12
- constructor(link, guestName, log) {
14
+ constructor(link, guestName, log, netOpts = {}) {
13
15
  super();
14
16
  this.link = link;
15
17
  this.guestName = guestName;
16
18
  this.log = log;
19
+ this.netOpts = netOpts;
17
20
  }
18
21
  connect(sinceSeq = 0) {
19
22
  return new Promise((resolve, reject) => {
20
- const ws = new WebSocket(this.link.wsUrl);
23
+ const ws = new WebSocket(this.link.wsUrl, {
24
+ // Resolve system-first, DoH-fallback (bypasses a stale NXDOMAIN negative
25
+ // cache). ws keeps SNI/Host = the hostname, so returning a DoH IP here
26
+ // does not break TLS validation or Cloudflare routing.
27
+ lookup: this.netOpts.lookup ?? makeGuestLookup(),
28
+ handshakeTimeout: this.netOpts.handshakeTimeoutMs ?? GUEST_HANDSHAKE_TIMEOUT_MS,
29
+ });
21
30
  this.ws = ws;
31
+ // Overall connect+auth deadline: handshakeTimeout only bounds DNS+TCP+TLS+
32
+ // upgrade; the post-open challenge/auth round-trip is otherwise unbounded.
33
+ let settled = false;
34
+ const deadline = setTimeout(() => {
35
+ if (settled)
36
+ return;
37
+ settled = true;
38
+ try {
39
+ ws.terminate();
40
+ }
41
+ catch {
42
+ /* already gone */
43
+ }
44
+ reject(new Error('timed out establishing tunnel'));
45
+ }, this.netOpts.connectDeadlineMs ?? GUEST_CONNECT_DEADLINE_MS);
46
+ const settleResolve = (v) => {
47
+ if (settled)
48
+ return;
49
+ settled = true;
50
+ clearTimeout(deadline);
51
+ resolve(v);
52
+ };
53
+ const settleReject = (e) => {
54
+ if (settled)
55
+ return;
56
+ settled = true;
57
+ clearTimeout(deadline);
58
+ reject(e);
59
+ };
22
60
  ws.on('message', (data) => {
23
61
  let frame;
24
62
  try {
@@ -38,10 +76,10 @@ export class GuestClient extends EventEmitter {
38
76
  else if (frame.t === 'auth_ok') {
39
77
  for (const m of frame.backlog)
40
78
  this.log.record(m);
41
- resolve({ goal: frame.goal, peerName: frame.peerName });
79
+ settleResolve({ goal: frame.goal, peerName: frame.peerName });
42
80
  }
43
81
  else if (frame.t === 'auth_fail') {
44
- reject(new Error(`auth failed: ${frame.reason}`));
82
+ settleReject(new Error(`auth failed: ${frame.reason}`));
45
83
  ws.close();
46
84
  }
47
85
  else if (frame.t === 'msg') {
@@ -56,7 +94,7 @@ export class GuestClient extends EventEmitter {
56
94
  });
57
95
  ws.on('close', () => this.failPending(new Error('tunnel disconnected')));
58
96
  ws.on('error', (err) => {
59
- reject(err);
97
+ settleReject(err);
60
98
  this.failPending(err);
61
99
  });
62
100
  });
@@ -0,0 +1,26 @@
1
+ import type { LookupOptions } from 'node:dns';
2
+ import { dohResolve } from '../net/doh.js';
3
+ type Addr = {
4
+ address: string;
5
+ family: number;
6
+ };
7
+ type LookupCallback = (err: NodeJS.ErrnoException | null, address?: string | Addr[], family?: number) => void;
8
+ type SysLookup = (hostname: string, options: LookupOptions, callback: LookupCallback) => void;
9
+ export interface GuestLookupOpts {
10
+ dohEnabled?: boolean;
11
+ doh?: typeof dohResolve;
12
+ sys?: SysLookup;
13
+ sysTimeoutMs?: number;
14
+ retries?: number;
15
+ retryDelayMs?: number;
16
+ }
17
+ export declare function dohEnabledByDefault(): boolean;
18
+ /**
19
+ * A drop-in `dns.lookup` for the guest WebSocket. Tries the system resolver
20
+ * first (respects split-horizon/corp DNS, and is what most guests need), then —
21
+ * only on failure — falls back to DoH, so a guest whose resolver lags or holds a
22
+ * stale NXDOMAIN negative cache still connects. Returns only an address; ws/tls
23
+ * keep SNI/Host = the hostname, so returning a DoH IP does not break routing.
24
+ */
25
+ export declare function makeGuestLookup(o?: GuestLookupOpts): (hostname: string, options: LookupOptions | number, callback: LookupCallback) => void;
26
+ export {};
@@ -0,0 +1,80 @@
1
+ import { lookup as sysLookup } from 'node:dns';
2
+ import { dohResolve } from '../net/doh.js';
3
+ import { envFlag } from '../env.js';
4
+ import { GUEST_SYS_LOOKUP_TIMEOUT_MS, DOH_GUEST_RETRIES, DOH_GUEST_RETRY_DELAY_MS, } from '../config.js';
5
+ // DoH fallback is ON by default; only an explicit off/0/false/no disables it.
6
+ export function dohEnabledByDefault() {
7
+ return process.env.TUNNEL_DOH === undefined || envFlag('TUNNEL_DOH');
8
+ }
9
+ /**
10
+ * A drop-in `dns.lookup` for the guest WebSocket. Tries the system resolver
11
+ * first (respects split-horizon/corp DNS, and is what most guests need), then —
12
+ * only on failure — falls back to DoH, so a guest whose resolver lags or holds a
13
+ * stale NXDOMAIN negative cache still connects. Returns only an address; ws/tls
14
+ * keep SNI/Host = the hostname, so returning a DoH IP does not break routing.
15
+ */
16
+ export function makeGuestLookup(o = {}) {
17
+ const dohEnabled = o.dohEnabled ?? dohEnabledByDefault();
18
+ const doh = o.doh ?? dohResolve;
19
+ const sys = o.sys ?? sysLookup;
20
+ const sysTimeoutMs = o.sysTimeoutMs ?? GUEST_SYS_LOOKUP_TIMEOUT_MS;
21
+ const retries = o.retries ?? DOH_GUEST_RETRIES;
22
+ const retryDelayMs = o.retryDelayMs ?? DOH_GUEST_RETRY_DELAY_MS;
23
+ return function guestLookup(hostname, options, callback) {
24
+ const opts = typeof options === 'number' ? { family: options } : (options ?? {});
25
+ const wantAll = opts.all === true;
26
+ const family = opts.family === 6 ? 6 : 4; // prefer A/IPv4; AAAA only when explicitly asked
27
+ let settled = false;
28
+ const done = (err, address, fam) => {
29
+ if (settled)
30
+ return;
31
+ settled = true;
32
+ callback(err, address, fam);
33
+ };
34
+ // Stage 1: system resolver first, bounded so a poisoned/lagging getaddrinfo
35
+ // can't stall for seconds before we fall back to DoH.
36
+ let sysSettled = false;
37
+ const sysTimer = setTimeout(() => {
38
+ if (!sysSettled) {
39
+ sysSettled = true;
40
+ goDoh(new Error('system lookup timed out'));
41
+ }
42
+ }, sysTimeoutMs);
43
+ sys(hostname, opts, (err, address, fam) => {
44
+ if (sysSettled)
45
+ return;
46
+ sysSettled = true;
47
+ clearTimeout(sysTimer);
48
+ const ok = !err && (wantAll ? Array.isArray(address) && address.length > 0 : !!address);
49
+ if (ok)
50
+ return done(null, address, fam);
51
+ goDoh(err ?? new Error(`getaddrinfo failed for ${hostname}`));
52
+ });
53
+ function goDoh(sysErr) {
54
+ if (!dohEnabled)
55
+ return fail(sysErr);
56
+ let attempt = 0;
57
+ const tryOnce = () => {
58
+ doh(hostname, family)
59
+ .then((res) => {
60
+ if (res.klass === 'RESOLVED') {
61
+ if (wantAll)
62
+ return done(null, res.addresses.map((a) => ({ address: a.address, family: a.family })));
63
+ return done(null, res.addresses[0].address, res.addresses[0].family);
64
+ }
65
+ // NXDOMAIN (still propagating) or INDETERMINATE (DoH blocked): retry a few times.
66
+ if (++attempt < retries)
67
+ return void setTimeout(tryOnce, retryDelayMs);
68
+ fail(sysErr);
69
+ })
70
+ .catch(() => ++attempt < retries ? void setTimeout(tryOnce, retryDelayMs) : fail(sysErr));
71
+ };
72
+ tryOnce();
73
+ }
74
+ function fail(sysErr) {
75
+ const e = new Error(`could not resolve ${hostname}: system resolver failed (${sysErr.message}) and DoH (1.1.1.1/1.0.0.1/8.8.8.8) also failed`);
76
+ e.code = 'ENOTFOUND';
77
+ done(e, wantAll ? [] : '', family);
78
+ }
79
+ };
80
+ }
package/dist/session.js CHANGED
@@ -7,12 +7,9 @@ import { GuestClient } from './relay/guestClient.js';
7
7
  import { ensureCloudflared as realEnsure } from './cloudflared/provision.js';
8
8
  import { startCloudflared as realStart } from './cloudflared/tunnelProcess.js';
9
9
  import { DEFAULT_LISTEN_TIMEOUT_MS, DEFAULT_IDLE_TEARDOWN_MS, DEFAULT_JOIN_LINK_TTL_MS, OPEN_RETRY_ATTEMPTS, } from './config.js';
10
- import { envFlag } from './env.js';
11
10
  const DEFAULT_DEPS = {
12
11
  ensureCloudflared: realEnsure,
13
- // Read the escape hatch per-call (not at module load) so it can be set right
14
- // before opening a tunnel.
15
- startCloudflared: (bin, port) => realStart(bin, port, envFlag('TUNNEL_SKIP_REACHABILITY_CHECK') ? { skipHealthCheck: true } : {}),
12
+ startCloudflared: (bin, port) => realStart(bin, port),
16
13
  };
17
14
  export class TunnelSession {
18
15
  deps;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "tunnel-mcp",
3
- "version": "0.1.2",
3
+ "version": "0.1.4",
4
4
  "description": "Let two developers' Claude agents talk directly through an ephemeral, end-to-end-encrypted tunnel.",
5
5
  "type": "module",
6
6
  "bin": {