@duyquangnvx/webnovel-downloader 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +151 -0
- package/dist/chunk-7IERJRIL.js +103 -0
- package/dist/chunk-7IERJRIL.js.map +1 -0
- package/dist/chunk-D5G3S2VW.js +250 -0
- package/dist/chunk-D5G3S2VW.js.map +1 -0
- package/dist/client-OCC6BJIO.js +12 -0
- package/dist/client-OCC6BJIO.js.map +1 -0
- package/dist/errors-UKZDQ6Y3.js +25 -0
- package/dist/errors-UKZDQ6Y3.js.map +1 -0
- package/dist/index.d.ts +862 -0
- package/dist/index.js +2399 -0
- package/dist/index.js.map +1 -0
- package/package.json +79 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 duyquangnvx
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
# webnovel-downloader
|
|
2
|
+
|
|
3
|
+
A pluggable, type-safe webnovel downloader for Node.js. Part of the [`webnovel-studio`](https://github.com/duyquangnvx/webnovel-studio) toolkit, but usable standalone.
|
|
4
|
+
|
|
5
|
+
> **Status:** Pre-alpha. APIs are unstable.
|
|
6
|
+
|
|
7
|
+
## What it does
|
|
8
|
+
|
|
9
|
+
Given a webnovel URL from a supported site, fetch the full novel as structured data: metadata + ordered chapters. Output is format-agnostic — pair with a separate formatter package (EPUB, TXT, JSON, Markdown) to produce final files.
|
|
10
|
+
|
|
11
|
+
## Supported sites
|
|
12
|
+
|
|
13
|
+
> Sites with Cloudflare/JS challenges are tracked under M5 — transport tier.
|
|
14
|
+
|
|
15
|
+
| Site | Adapter id | TOC strategy | Notes |
|
|
16
|
+
|---|---|---|---|
|
|
17
|
+
| `truyenfull.vision` (+ `truyenfull.vn`) | `truyenfull` | S1 paginated (`/trang-N/`) | Auto-rewrites `truyenfull.vn` → `truyenfull.vision`. |
|
|
18
|
+
| `metruyenchu.com.vn` | `metruyenchu-com-vn` | S2 + S4 hybrid (HTML page-1 + JSON `/get/listchap/<bid>`) | Distinct from the dead `metruyenchu.com` brand. |
|
|
19
|
+
| `wikicv.net` (+ `truyenwikidich.net`) | `wikicv` | S4 via browser (`/book/index` XHR intercept; HMAC-signed) | Auto-rewrites `truyenwikidich.net` host. Requires the browser tier (patchright recommended). |
|
|
20
|
+
| `tangthuvien.net` | `tangthuvien` | scaffold — M5.1 | Site was unreachable at 2026-05-03 survey; selectors and parsers TBD. |
|
|
21
|
+
|
|
22
|
+
## Transport tier
|
|
23
|
+
|
|
24
|
+
Sites protected by Cloudflare or rendering content via JS need a real browser. Install one of:
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
pnpm add patchright # recommended for Cloudflare-fronted sites
|
|
28
|
+
pnpm add playwright # works for non-protected pages
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Both are peer-optional. The Downloader resolves the module at runtime: `patchright` first, then `playwright`. The shared `downloader` singleton uses `"auto"` transport (headless). For a custom transport — e.g. a headed browser to solve a Cloudflare challenge by hand — build your own instance with `createDownloader` (it wires the built-in adapters for you):
|
|
32
|
+
|
|
33
|
+
```ts
|
|
34
|
+
import { createDownloader } from "@duyquangnvx/webnovel-downloader";
|
|
35
|
+
|
|
36
|
+
const dl = createDownloader({
|
|
37
|
+
transport: { mode: "auto", browserOptions: { headed: true } },
|
|
38
|
+
});
|
|
39
|
+
try {
|
|
40
|
+
const result = await dl.download(url);
|
|
41
|
+
} finally {
|
|
42
|
+
await dl.dispose(); // releases the browser pool
|
|
43
|
+
}
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Modes:
|
|
47
|
+
- `"auto"` — undici first, escalate to browser on CF challenge.
|
|
48
|
+
- `"http-only"` — never use the browser; adapters with `preferredTransport: "browser"` will throw.
|
|
49
|
+
- `"browser-required"` — every request through the browser.
|
|
50
|
+
|
|
51
|
+
Manual CF solve: pass `browserOptions.headed: true` to launch a visible window once; solve the checkbox; cookies cache for ~30 min.
|
|
52
|
+
|
|
53
|
+
Run a manual end-to-end check across the active adapters with:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
pnpm smoke:live # default — first 20 chapters per adapter (~25s)
|
|
57
|
+
pnpm smoke:live --quick # first 5 chapters per adapter (~10s)
|
|
58
|
+
pnpm smoke:live --metadata-only # fetchMetadata only, no TOC walk (~3s)
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
Modes use `DownloadOptions.chapterRange` so the TOC walk short-circuits early — no novel size matters.
|
|
62
|
+
|
|
63
|
+
Adding a new site = implementing a single `SiteAdapter`. See [`docs/adapter-spec.md`](./docs/adapter-spec.md).
|
|
64
|
+
|
|
65
|
+
## Install
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
pnpm add @duyquangnvx/webnovel-downloader
|
|
69
|
+
# optional browser tier (Cloudflare / JS-rendered sites — see below):
|
|
70
|
+
pnpm add patchright
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Published to npm under public access (`@duyquangnvx/webnovel-downloader`).
|
|
74
|
+
|
|
75
|
+
## Quick example
|
|
76
|
+
|
|
77
|
+
```ts
|
|
78
|
+
import { createDownloader } from "@duyquangnvx/webnovel-downloader";
|
|
79
|
+
|
|
80
|
+
const downloader = createDownloader({ rateLimit: { requestsPerSecond: 2 } });
|
|
81
|
+
const result = await downloader.download("https://truyenfull.vn/tien-nghich/", {
|
|
82
|
+
concurrency: 4,
|
|
83
|
+
});
|
|
84
|
+
|
|
85
|
+
if (result.status === "success") {
|
|
86
|
+
console.log(result.data.metadata.title);
|
|
87
|
+
console.log(`${result.data.chapters.length} chapters`);
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
## Resume & partial downloads
|
|
92
|
+
|
|
93
|
+
`download()` always returns a result envelope — it only throws if you abort it.
|
|
94
|
+
|
|
95
|
+
```ts
|
|
96
|
+
const result = await downloader.download(url, { resume: true });
|
|
97
|
+
|
|
98
|
+
if (result.status === "partial") {
|
|
99
|
+
// Some chapters failed; retry just those with the issued token.
|
|
100
|
+
const retry = await downloader.download(url, {
|
|
101
|
+
resume: { token: result.resumeToken },
|
|
102
|
+
});
|
|
103
|
+
}
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
Download only part of a novel (chapter 1 is **index 0** — ranges are 0-based and inclusive):
|
|
107
|
+
|
|
108
|
+
```ts
|
|
109
|
+
await downloader.download(url, { chapterRange: { from: 0, to: 9 } }); // first 10 chapters
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
Check support without a try/catch:
|
|
113
|
+
|
|
114
|
+
```ts
|
|
115
|
+
downloader.canHandle("https://truyenfull.vn/x/"); // boolean
|
|
116
|
+
downloader.supportedSites(); // [{ id, displayName, hostnames }, ...]
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
For browser-tier sites (e.g. wikicv) in a long-lived process, build your own
|
|
120
|
+
instance and dispose it; the shared `downloader` singleton is fine for short
|
|
121
|
+
scripts:
|
|
122
|
+
|
|
123
|
+
```ts
|
|
124
|
+
const dl = createDownloader();
|
|
125
|
+
try {
|
|
126
|
+
await dl.download(url);
|
|
127
|
+
} finally {
|
|
128
|
+
await dl.dispose();
|
|
129
|
+
}
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Documentation (Source of Truth)
|
|
133
|
+
|
|
134
|
+
Read in this order:
|
|
135
|
+
|
|
136
|
+
1. [`docs/architecture.md`](./docs/architecture.md) — Big picture, layers, why
|
|
137
|
+
2. [`docs/data-model.md`](./docs/data-model.md) — Core types and contracts
|
|
138
|
+
3. [`docs/pipeline.md`](./docs/pipeline.md) — Download flow, events, errors, resume
|
|
139
|
+
4. [`docs/adapter-spec.md`](./docs/adapter-spec.md) — How to add a new site
|
|
140
|
+
5. [`docs/conventions.md`](./docs/conventions.md) — Coding standards
|
|
141
|
+
6. [`docs/roadmap.md`](./docs/roadmap.md) — Build sequence and milestones
|
|
142
|
+
|
|
143
|
+
## Publishing
|
|
144
|
+
|
|
145
|
+
Maintainers only. `pnpm release` runs `prepublishOnly` (builds `dist/` via tsup) and
|
|
146
|
+
then `pnpm publish --access public`. Only `dist/` ships (see `files`). Bump `version`
|
|
147
|
+
first and publish from a clean `main` (pnpm enforces git checks by default).
|
|
148
|
+
|
|
149
|
+
## License
|
|
150
|
+
|
|
151
|
+
MIT — see [`LICENSE`](./LICENSE).
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
// src/core/errors.ts
|
|
2
|
+
var DownloadError = class extends Error {
|
|
3
|
+
cause;
|
|
4
|
+
constructor(message, options) {
|
|
5
|
+
super(message);
|
|
6
|
+
this.cause = options?.cause;
|
|
7
|
+
this.name = this.constructor.name;
|
|
8
|
+
}
|
|
9
|
+
};
|
|
10
|
+
var AdapterNotFoundError = class extends DownloadError {
|
|
11
|
+
constructor(url, options) {
|
|
12
|
+
super(`No adapter for URL: ${url}`, options);
|
|
13
|
+
this.url = url;
|
|
14
|
+
}
|
|
15
|
+
url;
|
|
16
|
+
code = "ADAPTER_NOT_FOUND";
|
|
17
|
+
};
|
|
18
|
+
var HttpError = class extends DownloadError {
|
|
19
|
+
constructor(status, url, message, options) {
|
|
20
|
+
super(message ?? `HTTP ${status} for ${url}`, options);
|
|
21
|
+
this.status = status;
|
|
22
|
+
this.url = url;
|
|
23
|
+
}
|
|
24
|
+
status;
|
|
25
|
+
url;
|
|
26
|
+
code = "HTTP_ERROR";
|
|
27
|
+
};
|
|
28
|
+
var RateLimitedError = class extends DownloadError {
|
|
29
|
+
constructor(url, retryAfterMs, options) {
|
|
30
|
+
super(`Rate limited at ${url}`, options);
|
|
31
|
+
this.url = url;
|
|
32
|
+
this.retryAfterMs = retryAfterMs;
|
|
33
|
+
}
|
|
34
|
+
url;
|
|
35
|
+
retryAfterMs;
|
|
36
|
+
code = "RATE_LIMITED";
|
|
37
|
+
};
|
|
38
|
+
var ParseError = class extends DownloadError {
|
|
39
|
+
code = "PARSE_ERROR";
|
|
40
|
+
url;
|
|
41
|
+
snippet;
|
|
42
|
+
constructor(message, options) {
|
|
43
|
+
super(message, options);
|
|
44
|
+
this.url = options?.url;
|
|
45
|
+
this.snippet = options?.snippet;
|
|
46
|
+
}
|
|
47
|
+
};
|
|
48
|
+
var ChapterFetchError = class extends DownloadError {
|
|
49
|
+
constructor(ref, options) {
|
|
50
|
+
super(`Chapter ${ref.index} failed: ${ref.url}`, options);
|
|
51
|
+
this.ref = ref;
|
|
52
|
+
}
|
|
53
|
+
ref;
|
|
54
|
+
code = "CHAPTER_FETCH_FAILED";
|
|
55
|
+
};
|
|
56
|
+
var TimeoutError = class extends DownloadError {
|
|
57
|
+
constructor(url, options) {
|
|
58
|
+
super(`Timeout for ${url}`, options);
|
|
59
|
+
this.url = url;
|
|
60
|
+
}
|
|
61
|
+
url;
|
|
62
|
+
code = "TIMEOUT";
|
|
63
|
+
};
|
|
64
|
+
var CancelledError = class extends DownloadError {
|
|
65
|
+
code = "CANCELLED";
|
|
66
|
+
constructor(options) {
|
|
67
|
+
super("Cancelled", options);
|
|
68
|
+
}
|
|
69
|
+
};
|
|
70
|
+
var BrowserModuleNotInstalledError = class extends DownloadError {
|
|
71
|
+
code = "BROWSER_MODULE_NOT_INSTALLED";
|
|
72
|
+
constructor(options) {
|
|
73
|
+
super(
|
|
74
|
+
'No headless browser module installed. Install one of:\n pnpm add patchright (recommended for Cloudflare-fronted sites)\n pnpm add playwright (works for non-protected pages)\nOr pass transport: "http-only" to disable the browser tier.',
|
|
75
|
+
options
|
|
76
|
+
);
|
|
77
|
+
}
|
|
78
|
+
};
|
|
79
|
+
var ChallengeUnresolvedError = class extends DownloadError {
|
|
80
|
+
constructor(url, hint, options) {
|
|
81
|
+
const tail = hint === "manual-solve" ? 'Try transport: { mode: "auto", browserOptions: { headed: true } } and solve the checkbox in the launched window once.' : "Increase transport.browserOptions.navigationTimeoutMs or switch to a stronger transport (e.g. a Browserless/ZenRows custom HttpClient).";
|
|
82
|
+
super(`Cloudflare challenge could not be resolved within the navigation timeout for ${url}. ${tail}`, options);
|
|
83
|
+
this.url = url;
|
|
84
|
+
this.hint = hint;
|
|
85
|
+
}
|
|
86
|
+
url;
|
|
87
|
+
hint;
|
|
88
|
+
code = "CHALLENGE_UNRESOLVED";
|
|
89
|
+
};
|
|
90
|
+
|
|
91
|
+
export {
|
|
92
|
+
DownloadError,
|
|
93
|
+
AdapterNotFoundError,
|
|
94
|
+
HttpError,
|
|
95
|
+
RateLimitedError,
|
|
96
|
+
ParseError,
|
|
97
|
+
ChapterFetchError,
|
|
98
|
+
TimeoutError,
|
|
99
|
+
CancelledError,
|
|
100
|
+
BrowserModuleNotInstalledError,
|
|
101
|
+
ChallengeUnresolvedError
|
|
102
|
+
};
|
|
103
|
+
//# sourceMappingURL=chunk-7IERJRIL.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":["../src/core/errors.ts"],"sourcesContent":["import type { Url } from \"../data-model/primitives.js\";\nimport type { ChapterRef } from \"../data-model/novel.js\";\n\n/**\n * Base class for every error this library raises or returns. Each subclass\n * carries a stable, machine-readable `code` (e.g. `\"ADAPTER_NOT_FOUND\"`) so\n * callers can branch on `err.code` instead of `instanceof`.\n */\nexport abstract class DownloadError extends Error {\n abstract readonly code: string;\n override readonly cause?: unknown;\n constructor(message: string, options?: { cause?: unknown }) {\n super(message);\n this.cause = options?.cause;\n this.name = this.constructor.name;\n }\n}\n\n/**\n * No registered adapter can handle the given URL (`code: \"ADAPTER_NOT_FOUND\"`).\n * `download()` returns this in its `{ status: \"error\" }` envelope;\n * `fetchMetadata()` throws it.\n */\nexport class AdapterNotFoundError extends DownloadError {\n readonly code = \"ADAPTER_NOT_FOUND\" as const;\n constructor(\n public readonly url: string,\n options?: { cause?: unknown },\n ) {\n super(`No adapter for URL: ${url}`, options);\n }\n}\n\n/** A non-retryable HTTP error response (`code: \"HTTP_ERROR\"`); carries `status` and `url`. */\nexport class HttpError extends DownloadError {\n readonly code = \"HTTP_ERROR\" as const;\n constructor(\n public readonly status: number,\n public readonly url: Url,\n message?: string,\n options?: { cause?: unknown },\n ) {\n super(message ?? `HTTP ${status} for ${url}`, options);\n }\n}\n\n/** The host signalled rate limiting, e.g. HTTP 429 (`code: \"RATE_LIMITED\"`); `retryAfterMs` if known. */\nexport class RateLimitedError extends DownloadError {\n readonly code = \"RATE_LIMITED\" as const;\n constructor(\n public readonly url: Url,\n public readonly retryAfterMs?: number,\n options?: { cause?: unknown },\n ) {\n super(`Rate limited at ${url}`, options);\n }\n}\n\n/**\n * Malformed input or an unparseable/invalid response (`code: \"PARSE_ERROR\"`) —\n * e.g. a bad URL, an out-of-range `chapterRange` bound, or adapter output that\n * fails schema validation. Carries optional `url` and `snippet` for debugging.\n */\nexport class ParseError extends DownloadError {\n readonly code = \"PARSE_ERROR\" as const;\n readonly url: Url | undefined;\n readonly snippet: string | undefined;\n constructor(message: string, options?: { cause?: unknown; url?: Url; snippet?: string }) {\n super(message, options);\n this.url = options?.url;\n this.snippet = options?.snippet;\n }\n}\n\n/**\n * A single chapter could not be fetched (`code: \"CHAPTER_FETCH_FAILED\"`);\n * carries the failing `ref`. Surfaces in {@link DownloadPartial}'s `failures`.\n */\nexport class ChapterFetchError extends DownloadError {\n readonly code = \"CHAPTER_FETCH_FAILED\" as const;\n constructor(\n public readonly ref: ChapterRef,\n options?: { cause?: unknown },\n ) {\n super(`Chapter ${ref.index} failed: ${ref.url}`, options);\n }\n}\n\n/** A request exceeded its timeout (`code: \"TIMEOUT\"`). */\nexport class TimeoutError extends DownloadError {\n readonly code = \"TIMEOUT\" as const;\n constructor(\n public readonly url: Url,\n options?: { cause?: unknown },\n ) {\n super(`Timeout for ${url}`, options);\n }\n}\n\n/**\n * The operation was aborted via an `AbortSignal` (`code: \"CANCELLED\"`).\n * `download()` **throws** this rather than returning an error envelope.\n */\nexport class CancelledError extends DownloadError {\n readonly code = \"CANCELLED\" as const;\n constructor(options?: { cause?: unknown }) {\n super(\"Cancelled\", options);\n }\n}\n\n/**\n * A browser-tier request was required but neither `patchright` nor `playwright`\n * is installed (`code: \"BROWSER_MODULE_NOT_INSTALLED\"`). The message lists the\n * install commands and the `http-only` opt-out.\n */\nexport class BrowserModuleNotInstalledError extends DownloadError {\n readonly code = \"BROWSER_MODULE_NOT_INSTALLED\" as const;\n constructor(options?: { cause?: unknown }) {\n super(\n \"No headless browser module installed. Install one of:\\n\" +\n \" pnpm add patchright (recommended for Cloudflare-fronted sites)\\n\" +\n \" pnpm add playwright (works for non-protected pages)\\n\" +\n 'Or pass transport: \"http-only\" to disable the browser tier.',\n options,\n );\n }\n}\n\n/**\n * Suggested remedy on a {@link ChallengeUnresolvedError}:\n * `\"manual-solve\"` (launch a headed browser and solve once) or\n * `\"transport-config\"` (raise the navigation timeout / use a stronger transport).\n */\nexport type ChallengeUnresolvedHint = \"manual-solve\" | \"transport-config\";\n\n/**\n * A Cloudflare challenge could not be cleared within the navigation timeout\n * (`code: \"CHALLENGE_UNRESOLVED\"`). `hint` indicates how to recover; the message\n * spells out the concrete fix.\n */\nexport class ChallengeUnresolvedError extends DownloadError {\n readonly code = \"CHALLENGE_UNRESOLVED\" as const;\n constructor(\n public readonly url: Url,\n public readonly hint: ChallengeUnresolvedHint,\n options?: { cause?: unknown },\n ) {\n const tail =\n hint === \"manual-solve\"\n ? 'Try transport: { mode: \"auto\", browserOptions: { headed: true } } and solve the checkbox in the launched window once.'\n : \"Increase transport.browserOptions.navigationTimeoutMs or switch to a stronger transport (e.g. a Browserless/ZenRows custom HttpClient).\";\n super(`Cloudflare challenge could not be resolved within the navigation timeout for ${url}. ${tail}`, options);\n }\n}\n"],"mappings":";AAQO,IAAe,gBAAf,cAAqC,MAAM;AAAA,EAE9B;AAAA,EAClB,YAAY,SAAiB,SAA+B;AAC1D,UAAM,OAAO;AACb,SAAK,QAAQ,SAAS;AACtB,SAAK,OAAO,KAAK,YAAY;AAAA,EAC/B;AACF;AAOO,IAAM,uBAAN,cAAmC,cAAc;AAAA,EAEtD,YACkB,KAChB,SACA;AACA,UAAM,uBAAuB,GAAG,IAAI,OAAO;AAH3B;AAAA,EAIlB;AAAA,EAJkB;AAAA,EAFT,OAAO;AAOlB;AAGO,IAAM,YAAN,cAAwB,cAAc;AAAA,EAE3C,YACkB,QACA,KAChB,SACA,SACA;AACA,UAAM,WAAW,QAAQ,MAAM,QAAQ,GAAG,IAAI,OAAO;AALrC;AACA;AAAA,EAKlB;AAAA,EANkB;AAAA,EACA;AAAA,EAHT,OAAO;AASlB;AAGO,IAAM,mBAAN,cAA+B,cAAc;AAAA,EAElD,YACkB,KACA,cAChB,SACA;AACA,UAAM,mBAAmB,GAAG,IAAI,OAAO;AAJvB;AACA;AAAA,EAIlB;AAAA,EALkB;AAAA,EACA;AAAA,EAHT,OAAO;AAQlB;AAOO,IAAM,aAAN,cAAyB,cAAc;AAAA,EACnC,OAAO;AAAA,EACP;AAAA,EACA;AAAA,EACT,YAAY,SAAiB,SAA4D;AACvF,UAAM,SAAS,OAAO;AACtB,SAAK,MAAM,SAAS;AACpB,SAAK,UAAU,SAAS;AAAA,EAC1B;AACF;AAMO,IAAM,oBAAN,cAAgC,cAAc;AAAA,EAEnD,YACkB,KAChB,SACA;AACA,UAAM,WAAW,IAAI,KAAK,YAAY,IAAI,GAAG,IAAI,OAAO;AAHxC;AAAA,EAIlB;AAAA,EAJkB;AAAA,EAFT,OAAO;AAOlB;AAGO,IAAM,eAAN,cAA2B,cAAc;AAAA,EAE9C,YACkB,KAChB,SACA;AACA,UAAM,eAAe,GAAG,IAAI,OAAO;AAHnB;AAAA,EAIlB;AAAA,EAJkB;AAAA,EAFT,OAAO;AAOlB;AAMO,IAAM,iBAAN,cAA6B,cAAc;AAAA,EACvC,OAAO;AAAA,EAChB,YAAY,SAA+B;AACzC,UAAM,aAAa,OAAO;AAAA,EAC5B;AACF;AAOO,IAAM,iCAAN,cAA6C,cAAc;AAAA,EACvD,OAAO;AAAA,EAChB,YAAY,SAA+B;AACzC;AAAA,MACE;AAAA,MAIA;AAAA,IACF;AAAA,EACF;AACF;AAcO,IAAM,2BAAN,cAAuC,cAAc;AAAA,EAE1D,YACkB,KACA,MAChB,SACA;AACA,UAAM,OACJ,SAAS,iBACL,0HACA;AACN,UAAM,gFAAgF,GAAG,KAAK,IAAI,IAAI,OAAO;AAR7F;AACA;AAAA,EAQlB;AAAA,EATkB;AAAA,EACA;AAAA,EAHT,OAAO;AAYlB;","names":[]}
|
|
@@ -0,0 +1,250 @@
|
|
|
1
|
+
import {
|
|
2
|
+
CancelledError,
|
|
3
|
+
ChallengeUnresolvedError
|
|
4
|
+
} from "./chunk-7IERJRIL.js";
|
|
5
|
+
|
|
6
|
+
// src/data-model/primitives.ts
|
|
7
|
+
import { z } from "zod";
|
|
8
|
+
var UrlSchema = z.string().url().brand();
|
|
9
|
+
function unsafeBrandUrl(input) {
|
|
10
|
+
return input;
|
|
11
|
+
}
|
|
12
|
+
var ChapterIndexSchema = z.number().int().nonnegative().brand();
|
|
13
|
+
function unsafeBrandChapterIndex(input) {
|
|
14
|
+
return input;
|
|
15
|
+
}
|
|
16
|
+
function chapterIndex(input) {
|
|
17
|
+
return ChapterIndexSchema.parse(input);
|
|
18
|
+
}
|
|
19
|
+
var AdapterIdSchema = z.string().min(1).brand();
|
|
20
|
+
function unsafeBrandAdapterId(input) {
|
|
21
|
+
return input;
|
|
22
|
+
}
|
|
23
|
+
var IsoDateSchema = z.string().datetime().brand();
|
|
24
|
+
function unsafeBrandIsoDate(input) {
|
|
25
|
+
return input;
|
|
26
|
+
}
|
|
27
|
+
var ResumeTokenSchema = z.string().min(1).brand();
|
|
28
|
+
function unsafeBrandResumeToken(input) {
|
|
29
|
+
return input;
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
// src/http/browser/safe-close.ts
|
|
33
|
+
async function safeClosePage(page, logger) {
|
|
34
|
+
try {
|
|
35
|
+
await page.close();
|
|
36
|
+
} catch (err) {
|
|
37
|
+
logger?.debug({ err }, "page close failed (non-fatal)");
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
async function safeCloseContext(ctx, logger) {
|
|
41
|
+
try {
|
|
42
|
+
await ctx.close();
|
|
43
|
+
} catch (err) {
|
|
44
|
+
logger?.debug({ err }, "context close failed (non-fatal)");
|
|
45
|
+
}
|
|
46
|
+
}
|
|
47
|
+
|
|
48
|
+
// src/http/cookie-jar.ts
|
|
49
|
+
var CF_NAME_PATTERN = /^(cf_clearance|__cf_bm|cf_chl)/;
|
|
50
|
+
function isCfCookie(name) {
|
|
51
|
+
return CF_NAME_PATTERN.test(name);
|
|
52
|
+
}
|
|
53
|
+
var CookieJar = class {
|
|
54
|
+
#byHost = /* @__PURE__ */ new Map();
|
|
55
|
+
put(host, cookies, ua) {
|
|
56
|
+
const filtered = cookies.filter((c) => isCfCookie(c.name));
|
|
57
|
+
const key = host.toLowerCase();
|
|
58
|
+
const existing = this.#byHost.get(key);
|
|
59
|
+
const nextCookies = filtered.length > 0 ? filtered : existing?.cookies ?? [];
|
|
60
|
+
this.#byHost.set(key, { cookies: nextCookies, ua });
|
|
61
|
+
}
|
|
62
|
+
read(host) {
|
|
63
|
+
const entry = this.#byHost.get(host.toLowerCase());
|
|
64
|
+
if (!entry) return { cookieHeader: null, ua: null };
|
|
65
|
+
const cookieHeader = entry.cookies.length > 0 ? entry.cookies.map((c) => `${c.name}=${c.value}`).join("; ") : null;
|
|
66
|
+
return { cookieHeader, ua: entry.ua };
|
|
67
|
+
}
|
|
68
|
+
};
|
|
69
|
+
|
|
70
|
+
// src/http/cf-challenge.ts
|
|
71
|
+
var CF_TITLE_PHRASES = "Just a moment|Ch\u1EDD m\u1ED9t ch\xFAt";
|
|
72
|
+
var CF_TITLE_REGEX = new RegExp(`<title>(?:${CF_TITLE_PHRASES})\\b`, "i");
|
|
73
|
+
var CF_BODY_MARKER = /cf-browser-verification|cdn-cgi\/challenge-platform/;
|
|
74
|
+
var CF_CANDIDATE_STATUSES = /* @__PURE__ */ new Set([200, 403, 429, 503]);
|
|
75
|
+
function isCloudflareChallenge(res) {
|
|
76
|
+
if (!CF_CANDIDATE_STATUSES.has(res.status)) return false;
|
|
77
|
+
if (res.headers["cf-mitigated"] === "challenge") return true;
|
|
78
|
+
if (CF_TITLE_REGEX.test(res.body)) return true;
|
|
79
|
+
if (CF_BODY_MARKER.test(res.body)) return true;
|
|
80
|
+
return false;
|
|
81
|
+
}
|
|
82
|
+
|
|
83
|
+
// src/http/browser/client.ts
|
|
84
|
+
var RESPONSE_BUFFER_LIMIT = 200;
|
|
85
|
+
var CHALLENGE_TITLE = new RegExp(`^(?:${CF_TITLE_PHRASES})`, "i");
|
|
86
|
+
var BrowserHttpClient = class {
|
|
87
|
+
#pool;
|
|
88
|
+
#navTimeoutMs;
|
|
89
|
+
constructor(opts) {
|
|
90
|
+
this.#pool = opts.pool;
|
|
91
|
+
this.#navTimeoutMs = opts.navigationTimeoutMs ?? 3e4;
|
|
92
|
+
}
|
|
93
|
+
async get(url, opts) {
|
|
94
|
+
const handle = await this.#pool.acquireContext();
|
|
95
|
+
try {
|
|
96
|
+
const page = await handle.context.newPage();
|
|
97
|
+
try {
|
|
98
|
+
const response = await page.goto(String(url), {
|
|
99
|
+
timeout: this.#navTimeoutMs,
|
|
100
|
+
waitUntil: "domcontentloaded",
|
|
101
|
+
...opts?.signal ? { signal: opts.signal } : {}
|
|
102
|
+
});
|
|
103
|
+
const title = await page.title();
|
|
104
|
+
if (CHALLENGE_TITLE.test(title)) {
|
|
105
|
+
throw new ChallengeUnresolvedError(url, "manual-solve");
|
|
106
|
+
}
|
|
107
|
+
const body = await page.content();
|
|
108
|
+
const status = response?.status() ?? 200;
|
|
109
|
+
const headers = response?.headers() ?? {};
|
|
110
|
+
return { status, headers, body, url };
|
|
111
|
+
} finally {
|
|
112
|
+
await safeClosePage(page);
|
|
113
|
+
}
|
|
114
|
+
} catch (err) {
|
|
115
|
+
if (opts?.signal?.aborted) {
|
|
116
|
+
throw new CancelledError({ cause: opts.signal.reason });
|
|
117
|
+
}
|
|
118
|
+
throw err;
|
|
119
|
+
} finally {
|
|
120
|
+
await handle.release();
|
|
121
|
+
}
|
|
122
|
+
}
|
|
123
|
+
};
|
|
124
|
+
var BrowserClientImpl = class {
|
|
125
|
+
#context;
|
|
126
|
+
#navTimeoutMs;
|
|
127
|
+
constructor(opts) {
|
|
128
|
+
this.#context = opts.context;
|
|
129
|
+
this.#navTimeoutMs = opts.navigationTimeoutMs ?? 3e4;
|
|
130
|
+
}
|
|
131
|
+
async navigate(url, signal, opts = {}) {
|
|
132
|
+
const page = await this.#context.newPage();
|
|
133
|
+
const buffer = [];
|
|
134
|
+
const onResponse = (r) => {
|
|
135
|
+
buffer.push({ url: r.url(), status: r.status(), headers: r.headers(), raw: r });
|
|
136
|
+
if (buffer.length > RESPONSE_BUFFER_LIMIT) buffer.shift();
|
|
137
|
+
};
|
|
138
|
+
page.on("response", onResponse);
|
|
139
|
+
if (opts.initScript !== void 0) {
|
|
140
|
+
await page.addInitScript(opts.initScript);
|
|
141
|
+
}
|
|
142
|
+
let navError;
|
|
143
|
+
try {
|
|
144
|
+
await page.goto(String(url), {
|
|
145
|
+
timeout: this.#navTimeoutMs,
|
|
146
|
+
waitUntil: "domcontentloaded",
|
|
147
|
+
...signal ? { signal } : {}
|
|
148
|
+
});
|
|
149
|
+
} catch (err) {
|
|
150
|
+
navError = err;
|
|
151
|
+
}
|
|
152
|
+
async function waitForResponse(pattern, waitOpts) {
|
|
153
|
+
if (navError !== void 0) throw navError;
|
|
154
|
+
const timeoutMs = waitOpts.timeoutMs ?? 3e4;
|
|
155
|
+
const innerSignal = waitOpts.signal;
|
|
156
|
+
const validate = async (raw) => {
|
|
157
|
+
if (waitOpts.parse === "text") return raw.text();
|
|
158
|
+
const json = await raw.json();
|
|
159
|
+
return waitOpts.schema.parse(json);
|
|
160
|
+
};
|
|
161
|
+
const existing = waitOpts.skipBuffer ? void 0 : buffer.find((r) => pattern.test(r.url));
|
|
162
|
+
if (existing) {
|
|
163
|
+
return {
|
|
164
|
+
// Playwright Response.url() always returns an absolute URL
|
|
165
|
+
url: unsafeBrandUrl(existing.url),
|
|
166
|
+
status: existing.status,
|
|
167
|
+
body: await validate(existing.raw)
|
|
168
|
+
};
|
|
169
|
+
}
|
|
170
|
+
return new Promise((resolve, reject) => {
|
|
171
|
+
function listener(r) {
|
|
172
|
+
if (!pattern.test(r.url())) return;
|
|
173
|
+
cleanup();
|
|
174
|
+
validate(r).then(
|
|
175
|
+
(body) => (
|
|
176
|
+
// Playwright Response.url() always returns an absolute URL
|
|
177
|
+
resolve({ url: unsafeBrandUrl(r.url()), status: r.status(), body })
|
|
178
|
+
)
|
|
179
|
+
).catch(reject);
|
|
180
|
+
}
|
|
181
|
+
function onAbort() {
|
|
182
|
+
cleanup();
|
|
183
|
+
reject(new CancelledError({ cause: innerSignal?.reason ?? signal.reason }));
|
|
184
|
+
}
|
|
185
|
+
function cleanup() {
|
|
186
|
+
clearTimeout(timer);
|
|
187
|
+
page.off("response", listener);
|
|
188
|
+
innerSignal?.removeEventListener("abort", onAbort);
|
|
189
|
+
signal.removeEventListener("abort", onAbort);
|
|
190
|
+
}
|
|
191
|
+
const timer = setTimeout(() => {
|
|
192
|
+
cleanup();
|
|
193
|
+
reject(new Error(`Timeout waiting for response matching ${pattern}`));
|
|
194
|
+
}, timeoutMs);
|
|
195
|
+
if (innerSignal?.aborted) {
|
|
196
|
+
onAbort();
|
|
197
|
+
return;
|
|
198
|
+
}
|
|
199
|
+
if (signal.aborted) {
|
|
200
|
+
onAbort();
|
|
201
|
+
return;
|
|
202
|
+
}
|
|
203
|
+
innerSignal?.addEventListener("abort", onAbort, { once: true });
|
|
204
|
+
signal.addEventListener("abort", onAbort, { once: true });
|
|
205
|
+
page.on("response", listener);
|
|
206
|
+
});
|
|
207
|
+
}
|
|
208
|
+
return {
|
|
209
|
+
waitForResponse,
|
|
210
|
+
async content() {
|
|
211
|
+
if (navError !== void 0) throw navError;
|
|
212
|
+
return page.content();
|
|
213
|
+
},
|
|
214
|
+
async evaluate(script) {
|
|
215
|
+
if (navError !== void 0) throw navError;
|
|
216
|
+
return page.evaluate(script);
|
|
217
|
+
},
|
|
218
|
+
async close() {
|
|
219
|
+
page.off("response", onResponse);
|
|
220
|
+
await safeClosePage(page);
|
|
221
|
+
}
|
|
222
|
+
};
|
|
223
|
+
}
|
|
224
|
+
};
|
|
225
|
+
async function captureCfCookies(context, origin) {
|
|
226
|
+
const all = await context.cookies([origin]);
|
|
227
|
+
const cookies = all.filter((c) => isCfCookie(c.name)).map((c) => ({ name: c.name, value: c.value }));
|
|
228
|
+
return { cookies, ua: null };
|
|
229
|
+
}
|
|
230
|
+
|
|
231
|
+
export {
|
|
232
|
+
UrlSchema,
|
|
233
|
+
unsafeBrandUrl,
|
|
234
|
+
ChapterIndexSchema,
|
|
235
|
+
unsafeBrandChapterIndex,
|
|
236
|
+
chapterIndex,
|
|
237
|
+
AdapterIdSchema,
|
|
238
|
+
unsafeBrandAdapterId,
|
|
239
|
+
IsoDateSchema,
|
|
240
|
+
unsafeBrandIsoDate,
|
|
241
|
+
unsafeBrandResumeToken,
|
|
242
|
+
CookieJar,
|
|
243
|
+
isCloudflareChallenge,
|
|
244
|
+
safeClosePage,
|
|
245
|
+
safeCloseContext,
|
|
246
|
+
BrowserHttpClient,
|
|
247
|
+
BrowserClientImpl,
|
|
248
|
+
captureCfCookies
|
|
249
|
+
};
|
|
250
|
+
//# sourceMappingURL=chunk-D5G3S2VW.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":["../src/data-model/primitives.ts","../src/http/browser/safe-close.ts","../src/http/cookie-jar.ts","../src/http/cf-challenge.ts","../src/http/browser/client.ts"],"sourcesContent":["import { z } from \"zod\";\n\nexport const UrlSchema = z.string().url().brand<\"Url\">();\n/**\n * An absolute URL string, branded after validation so an unchecked string\n * cannot be passed where a real URL is required. Library output is already\n * branded; to brand untrusted input, use `parseUrl` from `data-model/parse.ts`.\n */\nexport type Url = z.infer<typeof UrlSchema>;\n/**\n * Brand a string as Url WITHOUT runtime validation. Use only when the caller\n * has already proved the value is a valid absolute URL (e.g. it came from\n * `new URL().toString()` or from a Zod-validated source). For untrusted input,\n * use `parseUrl` from `data-model/parse.ts` instead.\n */\nexport function unsafeBrandUrl(input: string): Url {\n return input as Url;\n}\n\nexport const ChapterIndexSchema = z.number().int().nonnegative().brand<\"ChapterIndex\">();\n/**\n * A chapter's position in the table of contents, branded as a non-negative\n * integer. Indices are **0-based**: chapter 1 is index 0. Build one with\n * {@link chapterIndex}.\n */\nexport type ChapterIndex = z.infer<typeof ChapterIndexSchema>;\n/** Brand a number as ChapterIndex without validation. See `unsafeBrandUrl`. */\nexport function unsafeBrandChapterIndex(input: number): ChapterIndex {\n return input as ChapterIndex;\n}\n/**\n * Build a validated ChapterIndex from a plain number. Throws if `input` is not\n * a non-negative integer. This is the public way to construct the bounds for\n * `DownloadOptions.chapterRange` (indices are 0-based: chapter 1 is index 0).\n */\nexport function chapterIndex(input: number): ChapterIndex {\n return ChapterIndexSchema.parse(input);\n}\n\nexport const AdapterIdSchema = z.string().min(1).brand<\"AdapterId\">();\n/** Stable identifier for a site adapter (e.g. `\"truyenfull\"`), branded as a non-empty string. */\nexport type AdapterId = z.infer<typeof AdapterIdSchema>;\n/** Brand a string as AdapterId without validation. See `unsafeBrandUrl`. */\nexport function unsafeBrandAdapterId(input: string): AdapterId {\n return input as AdapterId;\n}\n\nexport const IsoDateSchema = z.string().datetime().brand<\"IsoDate\">();\n/** An ISO-8601 datetime string (e.g. `\"2026-06-17T09:30:00.000Z\"`), branded after validation. */\nexport type IsoDate = z.infer<typeof IsoDateSchema>;\n/** Brand a string as IsoDate without validation. See `unsafeBrandUrl`. */\nexport function unsafeBrandIsoDate(input: string): IsoDate {\n return input as IsoDate;\n}\n\nexport const ResumeTokenSchema = z.string().min(1).brand<\"ResumeToken\">();\n/**\n * Opaque token returned on a partial download (see {@link DownloadPartial}).\n * Pass it back via `DownloadOptions.resume` to continue where the run left off.\n * Branded as a non-empty string.\n */\nexport type ResumeToken = z.infer<typeof ResumeTokenSchema>;\n/** Brand a string as ResumeToken without validation. See `unsafeBrandUrl`. */\nexport function unsafeBrandResumeToken(input: string): ResumeToken {\n return input as ResumeToken;\n}\n","import type { Logger } from \"../../data-model/ports.js\";\n\ninterface Closeable {\n close(): Promise<void>;\n}\n\nexport async function safeClosePage(page: Closeable, logger?: Logger): Promise<void> {\n try {\n await page.close();\n } catch (err) {\n logger?.debug({ err }, \"page close failed (non-fatal)\");\n }\n}\n\nexport async function safeCloseContext(ctx: Closeable, logger?: Logger): Promise<void> {\n try {\n await ctx.close();\n } catch (err) {\n logger?.debug({ err }, \"context close failed (non-fatal)\");\n }\n}\n","export interface Cookie {\n readonly name: string;\n readonly value: string;\n}\n\nexport interface CookieJarReadResult {\n readonly cookieHeader: string | null;\n readonly ua: string | null;\n}\n\ninterface Entry {\n readonly cookies: readonly Cookie[];\n readonly ua: string;\n}\n\nconst CF_NAME_PATTERN = /^(cf_clearance|__cf_bm|cf_chl)/;\n\nexport function isCfCookie(name: string): boolean {\n return CF_NAME_PATTERN.test(name);\n}\n\nexport class CookieJar {\n readonly #byHost = new Map<string, Entry>();\n\n put(host: string, cookies: readonly Cookie[], ua: string): void {\n // The UA pin is needed for HTTP-branch session continuity even when CF\n // hasn't issued cookies yet — dropping the entry on `filtered.length === 0`\n // would let the rotating UA client churn UAs until the first challenge.\n const filtered = cookies.filter((c) => isCfCookie(c.name));\n const key = host.toLowerCase();\n const existing = this.#byHost.get(key);\n const nextCookies = filtered.length > 0 ? filtered : (existing?.cookies ?? []);\n this.#byHost.set(key, { cookies: nextCookies, ua });\n }\n\n read(host: string): CookieJarReadResult {\n const entry = this.#byHost.get(host.toLowerCase());\n if (!entry) return { cookieHeader: null, ua: null };\n const cookieHeader =\n entry.cookies.length > 0\n ? entry.cookies.map((c) => `${c.name}=${c.value}`).join(\"; \")\n : null;\n return { cookieHeader, ua: entry.ua };\n }\n}\n","import type { HttpResponse } from \"./types.js\";\n\n// Raw regex-alternation fragment, not a plain string: callers splice it into a\n// `new RegExp()`. Phrases must stay metacharacter-free or the built regex shifts.\nexport const CF_TITLE_PHRASES = \"Just a moment|Chờ một chút\";\nconst CF_TITLE_REGEX = new RegExp(`<title>(?:${CF_TITLE_PHRASES})\\\\b`, \"i\");\nconst CF_BODY_MARKER = /cf-browser-verification|cdn-cgi\\/challenge-platform/;\n// Cloudflare serves challenge pages on 403 (block), 503 (\"Under Attack\"),\n// 429 (rate-gate), and occasionally 200 (Turnstile interstitial). The body /\n// header markers below still gate the decision; status alone is not enough.\nconst CF_CANDIDATE_STATUSES = new Set([200, 403, 429, 503]);\n\nexport function isCloudflareChallenge(res: HttpResponse): boolean {\n if (!CF_CANDIDATE_STATUSES.has(res.status)) return false;\n if (res.headers[\"cf-mitigated\"] === \"challenge\") return true;\n if (CF_TITLE_REGEX.test(res.body)) return true;\n if (CF_BODY_MARKER.test(res.body)) return true;\n return false;\n}\n","// src/http/browser/client.ts\nimport type { ZodType } from \"zod\";\nimport type { Url } from \"../../data-model/primitives.js\";\nimport { unsafeBrandUrl } from \"../../data-model/primitives.js\";\nimport type { HttpClient, HttpRequestOptions, HttpResponse } from \"../types.js\";\nimport type {\n BrowserClient,\n BrowserPage,\n NavigateOptions,\n RuntimeContext,\n RuntimePage,\n RuntimeResponse,\n WaitForResponseJsonOpts,\n WaitForResponseTextOpts,\n} from \"./types.js\";\nimport type { BrowserPool } from \"./pool.js\";\nimport { safeClosePage } from \"./safe-close.js\";\nimport { ChallengeUnresolvedError, CancelledError } from \"../../core/errors.js\";\nimport { isCfCookie } from \"../cookie-jar.js\";\nimport { CF_TITLE_PHRASES } from \"../cf-challenge.js\";\n\ntype WaitOpts =\n | {\n readonly timeoutMs?: number;\n readonly signal?: AbortSignal;\n readonly parse: \"text\";\n readonly skipBuffer?: boolean;\n }\n | {\n readonly timeoutMs?: number;\n readonly signal?: AbortSignal;\n readonly parse: \"json\";\n readonly schema: ZodType<unknown>;\n readonly skipBuffer?: boolean;\n };\n\ninterface BufferedResponse {\n url: string;\n status: number;\n headers: Record<string, string>;\n raw: RuntimeResponse;\n}\n\n// Cap how many responses we retain between `navigate()` and `waitForResponse()`.\n// Each entry holds a strong ref to a Playwright Response (with its body buffer),\n// so on heavy-XHR pages an uncapped buffer leaks memory for the page's lifetime.\n// 200 is large enough for the fast-path use case (target XHR fired during\n// goto/before listener attach) on real pages.\nconst RESPONSE_BUFFER_LIMIT = 200;\n\nconst CHALLENGE_TITLE = new RegExp(`^(?:${CF_TITLE_PHRASES})`, \"i\");\n\nexport class BrowserHttpClient implements HttpClient {\n readonly #pool: BrowserPool;\n readonly #navTimeoutMs: number;\n\n constructor(opts: { pool: BrowserPool; navigationTimeoutMs?: number }) {\n this.#pool = opts.pool;\n this.#navTimeoutMs = opts.navigationTimeoutMs ?? 30_000;\n }\n\n async get(url: Url, opts?: HttpRequestOptions): Promise<HttpResponse> {\n const handle = await this.#pool.acquireContext();\n try {\n const page = await handle.context.newPage();\n try {\n const response = await page.goto(String(url), {\n timeout: this.#navTimeoutMs,\n waitUntil: \"domcontentloaded\",\n ...(opts?.signal ? { signal: opts.signal } : {}),\n });\n const title = await page.title();\n if (CHALLENGE_TITLE.test(title)) {\n throw new ChallengeUnresolvedError(url, \"manual-solve\");\n }\n const body = await page.content();\n const status = response?.status() ?? 200;\n const headers = response?.headers() ?? {};\n return { status, headers, body, url };\n } finally {\n await safeClosePage(page);\n }\n } catch (err) {\n if (opts?.signal?.aborted) {\n throw new CancelledError({ cause: opts.signal.reason });\n }\n throw err;\n } finally {\n await handle.release();\n }\n }\n}\n\nexport class BrowserClientImpl implements BrowserClient {\n readonly #context: RuntimeContext;\n readonly #navTimeoutMs: number;\n\n constructor(opts: { context: RuntimeContext; navigationTimeoutMs?: number }) {\n this.#context = opts.context;\n this.#navTimeoutMs = opts.navigationTimeoutMs ?? 30_000;\n }\n\n async navigate(\n url: Url,\n signal: AbortSignal,\n opts: NavigateOptions = {},\n ): Promise<BrowserPage> {\n const page: RuntimePage = await this.#context.newPage();\n const buffer: BufferedResponse[] = [];\n const onResponse = (r: RuntimeResponse): void => {\n buffer.push({ url: r.url(), status: r.status(), headers: r.headers(), raw: r });\n if (buffer.length > RESPONSE_BUFFER_LIMIT) buffer.shift();\n };\n page.on(\"response\", onResponse);\n\n if (opts.initScript !== undefined) {\n await page.addInitScript(opts.initScript);\n }\n\n let navError: unknown;\n try {\n await page.goto(String(url), {\n timeout: this.#navTimeoutMs,\n waitUntil: \"domcontentloaded\",\n ...(signal ? { signal } : {}),\n });\n } catch (err) {\n navError = err;\n }\n\n function waitForResponse(\n pattern: RegExp,\n waitOpts: WaitForResponseTextOpts,\n ): Promise<{ url: Url; status: number; body: string }>;\n function waitForResponse<T>(\n pattern: RegExp,\n waitOpts: WaitForResponseJsonOpts<T>,\n ): Promise<{ url: Url; status: number; body: T }>;\n async function waitForResponse(\n pattern: RegExp,\n waitOpts: WaitOpts,\n ): Promise<{ url: Url; status: number; body: unknown }> {\n if (navError !== undefined) throw navError;\n const timeoutMs = waitOpts.timeoutMs ?? 30_000;\n const innerSignal = waitOpts.signal;\n const validate = async (raw: RuntimeResponse): Promise<unknown> => {\n if (waitOpts.parse === \"text\") return raw.text();\n const json = await raw.json();\n return waitOpts.schema.parse(json);\n };\n\n const existing = waitOpts.skipBuffer\n ? undefined\n : buffer.find((r) => pattern.test(r.url));\n if (existing) {\n return {\n // Playwright Response.url() always returns an absolute URL\n url: unsafeBrandUrl(existing.url),\n status: existing.status,\n body: await validate(existing.raw),\n };\n }\n\n return new Promise<{ url: Url; status: number; body: unknown }>((resolve, reject) => {\n function listener(r: RuntimeResponse): void {\n if (!pattern.test(r.url())) return;\n cleanup();\n validate(r)\n .then((body) =>\n // Playwright Response.url() always returns an absolute URL\n resolve({ url: unsafeBrandUrl(r.url()), status: r.status(), body }),\n )\n .catch(reject);\n }\n\n function onAbort(): void {\n cleanup();\n reject(new CancelledError({ cause: innerSignal?.reason ?? signal.reason }));\n }\n\n function cleanup(): void {\n clearTimeout(timer);\n page.off(\"response\", listener);\n innerSignal?.removeEventListener(\"abort\", onAbort);\n signal.removeEventListener(\"abort\", onAbort);\n }\n\n const timer = setTimeout(() => {\n cleanup();\n reject(new Error(`Timeout waiting for response matching ${pattern}`));\n }, timeoutMs);\n\n if (innerSignal?.aborted) { onAbort(); return; }\n if (signal.aborted) { onAbort(); return; }\n innerSignal?.addEventListener(\"abort\", onAbort, { once: true });\n signal.addEventListener(\"abort\", onAbort, { once: true });\n page.on(\"response\", listener);\n });\n }\n\n return {\n waitForResponse,\n\n async content(): Promise<string> {\n if (navError !== undefined) throw navError;\n return page.content();\n },\n\n async evaluate(script: string): Promise<unknown> {\n if (navError !== undefined) throw navError;\n return page.evaluate(script);\n },\n\n async close(): Promise<void> {\n page.off(\"response\", onResponse);\n await safeClosePage(page);\n },\n };\n }\n}\n\n/** Snapshot a context's CF cookies for jar replay. */\nexport async function captureCfCookies(\n context: RuntimeContext,\n origin: string,\n): Promise<{ cookies: ReadonlyArray<{ name: string; value: string }>; ua: string | null }> {\n const all = await context.cookies([origin]);\n const cookies = all\n .filter((c) => isCfCookie(c.name))\n .map((c) => ({ name: c.name, value: c.value }));\n // UA is set per-context at newContext({userAgent}) time; the pool callsite\n // owns this. captureCfCookies returns ua: null and the orchestrator pins the UA elsewhere.\n return { cookies, ua: null };\n}\n"],"mappings":";;;;;;AAAA,SAAS,SAAS;AAEX,IAAM,YAAY,EAAE,OAAO,EAAE,IAAI,EAAE,MAAa;AAahD,SAAS,eAAe,OAAoB;AACjD,SAAO;AACT;AAEO,IAAM,qBAAqB,EAAE,OAAO,EAAE,IAAI,EAAE,YAAY,EAAE,MAAsB;AAQhF,SAAS,wBAAwB,OAA6B;AACnE,SAAO;AACT;AAMO,SAAS,aAAa,OAA6B;AACxD,SAAO,mBAAmB,MAAM,KAAK;AACvC;AAEO,IAAM,kBAAkB,EAAE,OAAO,EAAE,IAAI,CAAC,EAAE,MAAmB;AAI7D,SAAS,qBAAqB,OAA0B;AAC7D,SAAO;AACT;AAEO,IAAM,gBAAgB,EAAE,OAAO,EAAE,SAAS,EAAE,MAAiB;AAI7D,SAAS,mBAAmB,OAAwB;AACzD,SAAO;AACT;AAEO,IAAM,oBAAoB,EAAE,OAAO,EAAE,IAAI,CAAC,EAAE,MAAqB;AAQjE,SAAS,uBAAuB,OAA4B;AACjE,SAAO;AACT;;;AC3DA,eAAsB,cAAc,MAAiB,QAAgC;AACnF,MAAI;AACF,UAAM,KAAK,MAAM;AAAA,EACnB,SAAS,KAAK;AACZ,YAAQ,MAAM,EAAE,IAAI,GAAG,+BAA+B;AAAA,EACxD;AACF;AAEA,eAAsB,iBAAiB,KAAgB,QAAgC;AACrF,MAAI;AACF,UAAM,IAAI,MAAM;AAAA,EAClB,SAAS,KAAK;AACZ,YAAQ,MAAM,EAAE,IAAI,GAAG,kCAAkC;AAAA,EAC3D;AACF;;;ACLA,IAAM,kBAAkB;AAEjB,SAAS,WAAW,MAAuB;AAChD,SAAO,gBAAgB,KAAK,IAAI;AAClC;AAEO,IAAM,YAAN,MAAgB;AAAA,EACZ,UAAU,oBAAI,IAAmB;AAAA,EAE1C,IAAI,MAAc,SAA4B,IAAkB;AAI9D,UAAM,WAAW,QAAQ,OAAO,CAAC,MAAM,WAAW,EAAE,IAAI,CAAC;AACzD,UAAM,MAAM,KAAK,YAAY;AAC7B,UAAM,WAAW,KAAK,QAAQ,IAAI,GAAG;AACrC,UAAM,cAAc,SAAS,SAAS,IAAI,WAAY,UAAU,WAAW,CAAC;AAC5E,SAAK,QAAQ,IAAI,KAAK,EAAE,SAAS,aAAa,GAAG,CAAC;AAAA,EACpD;AAAA,EAEA,KAAK,MAAmC;AACtC,UAAM,QAAQ,KAAK,QAAQ,IAAI,KAAK,YAAY,CAAC;AACjD,QAAI,CAAC,MAAO,QAAO,EAAE,cAAc,MAAM,IAAI,KAAK;AAClD,UAAM,eACJ,MAAM,QAAQ,SAAS,IACnB,MAAM,QAAQ,IAAI,CAAC,MAAM,GAAG,EAAE,IAAI,IAAI,EAAE,KAAK,EAAE,EAAE,KAAK,IAAI,IAC1D;AACN,WAAO,EAAE,cAAc,IAAI,MAAM,GAAG;AAAA,EACtC;AACF;;;ACxCO,IAAM,mBAAmB;AAChC,IAAM,iBAAiB,IAAI,OAAO,aAAa,gBAAgB,QAAQ,GAAG;AAC1E,IAAM,iBAAiB;AAIvB,IAAM,wBAAwB,oBAAI,IAAI,CAAC,KAAK,KAAK,KAAK,GAAG,CAAC;AAEnD,SAAS,sBAAsB,KAA4B;AAChE,MAAI,CAAC,sBAAsB,IAAI,IAAI,MAAM,EAAG,QAAO;AACnD,MAAI,IAAI,QAAQ,cAAc,MAAM,YAAa,QAAO;AACxD,MAAI,eAAe,KAAK,IAAI,IAAI,EAAG,QAAO;AAC1C,MAAI,eAAe,KAAK,IAAI,IAAI,EAAG,QAAO;AAC1C,SAAO;AACT;;;AC8BA,IAAM,wBAAwB;AAE9B,IAAM,kBAAkB,IAAI,OAAO,OAAO,gBAAgB,KAAK,GAAG;AAE3D,IAAM,oBAAN,MAA8C;AAAA,EAC1C;AAAA,EACA;AAAA,EAET,YAAY,MAA2D;AACrE,SAAK,QAAQ,KAAK;AAClB,SAAK,gBAAgB,KAAK,uBAAuB;AAAA,EACnD;AAAA,EAEA,MAAM,IAAI,KAAU,MAAkD;AACpE,UAAM,SAAS,MAAM,KAAK,MAAM,eAAe;AAC/C,QAAI;AACF,YAAM,OAAO,MAAM,OAAO,QAAQ,QAAQ;AAC1C,UAAI;AACF,cAAM,WAAW,MAAM,KAAK,KAAK,OAAO,GAAG,GAAG;AAAA,UAC5C,SAAS,KAAK;AAAA,UACd,WAAW;AAAA,UACX,GAAI,MAAM,SAAS,EAAE,QAAQ,KAAK,OAAO,IAAI,CAAC;AAAA,QAChD,CAAC;AACD,cAAM,QAAQ,MAAM,KAAK,MAAM;AAC/B,YAAI,gBAAgB,KAAK,KAAK,GAAG;AAC/B,gBAAM,IAAI,yBAAyB,KAAK,cAAc;AAAA,QACxD;AACA,cAAM,OAAO,MAAM,KAAK,QAAQ;AAChC,cAAM,SAAS,UAAU,OAAO,KAAK;AACrC,cAAM,UAAU,UAAU,QAAQ,KAAK,CAAC;AACxC,eAAO,EAAE,QAAQ,SAAS,MAAM,IAAI;AAAA,MACtC,UAAE;AACA,cAAM,cAAc,IAAI;AAAA,MAC1B;AAAA,IACF,SAAS,KAAK;AACZ,UAAI,MAAM,QAAQ,SAAS;AACzB,cAAM,IAAI,eAAe,EAAE,OAAO,KAAK,OAAO,OAAO,CAAC;AAAA,MACxD;AACA,YAAM;AAAA,IACR,UAAE;AACA,YAAM,OAAO,QAAQ;AAAA,IACvB;AAAA,EACF;AACF;AAEO,IAAM,oBAAN,MAAiD;AAAA,EAC7C;AAAA,EACA;AAAA,EAET,YAAY,MAAiE;AAC3E,SAAK,WAAW,KAAK;AACrB,SAAK,gBAAgB,KAAK,uBAAuB;AAAA,EACnD;AAAA,EAEA,MAAM,SACJ,KACA,QACA,OAAwB,CAAC,GACH;AACtB,UAAM,OAAoB,MAAM,KAAK,SAAS,QAAQ;AACtD,UAAM,SAA6B,CAAC;AACpC,UAAM,aAAa,CAAC,MAA6B;AAC/C,aAAO,KAAK,EAAE,KAAK,EAAE,IAAI,GAAG,QAAQ,EAAE,OAAO,GAAG,SAAS,EAAE,QAAQ,GAAG,KAAK,EAAE,CAAC;AAC9E,UAAI,OAAO,SAAS,sBAAuB,QAAO,MAAM;AAAA,IAC1D;AACA,SAAK,GAAG,YAAY,UAAU;AAE9B,QAAI,KAAK,eAAe,QAAW;AACjC,YAAM,KAAK,cAAc,KAAK,UAAU;AAAA,IAC1C;AAEA,QAAI;AACJ,QAAI;AACF,YAAM,KAAK,KAAK,OAAO,GAAG,GAAG;AAAA,QAC3B,SAAS,KAAK;AAAA,QACd,WAAW;AAAA,QACX,GAAI,SAAS,EAAE,OAAO,IAAI,CAAC;AAAA,MAC7B,CAAC;AAAA,IACH,SAAS,KAAK;AACZ,iBAAW;AAAA,IACb;AAUA,mBAAe,gBACb,SACA,UACsD;AACtD,UAAI,aAAa,OAAW,OAAM;AAClC,YAAM,YAAY,SAAS,aAAa;AACxC,YAAM,cAAc,SAAS;AAC7B,YAAM,WAAW,OAAO,QAA2C;AACjE,YAAI,SAAS,UAAU,OAAQ,QAAO,IAAI,KAAK;AAC/C,cAAM,OAAO,MAAM,IAAI,KAAK;AAC5B,eAAO,SAAS,OAAO,MAAM,IAAI;AAAA,MACnC;AAEA,YAAM,WAAW,SAAS,aACtB,SACA,OAAO,KAAK,CAAC,MAAM,QAAQ,KAAK,EAAE,GAAG,CAAC;AAC1C,UAAI,UAAU;AACZ,eAAO;AAAA;AAAA,UAEL,KAAK,eAAe,SAAS,GAAG;AAAA,UAChC,QAAQ,SAAS;AAAA,UACjB,MAAM,MAAM,SAAS,SAAS,GAAG;AAAA,QACnC;AAAA,MACF;AAEA,aAAO,IAAI,QAAqD,CAAC,SAAS,WAAW;AACnF,iBAAS,SAAS,GAA0B;AAC1C,cAAI,CAAC,QAAQ,KAAK,EAAE,IAAI,CAAC,EAAG;AAC5B,kBAAQ;AACR,mBAAS,CAAC,EACP;AAAA,YAAK,CAAC;AAAA;AAAA,cAEL,QAAQ,EAAE,KAAK,eAAe,EAAE,IAAI,CAAC,GAAG,QAAQ,EAAE,OAAO,GAAG,KAAK,CAAC;AAAA;AAAA,UACpE,EACC,MAAM,MAAM;AAAA,QACjB;AAEA,iBAAS,UAAgB;AACvB,kBAAQ;AACR,iBAAO,IAAI,eAAe,EAAE,OAAO,aAAa,UAAU,OAAO,OAAO,CAAC,CAAC;AAAA,QAC5E;AAEA,iBAAS,UAAgB;AACvB,uBAAa,KAAK;AAClB,eAAK,IAAI,YAAY,QAAQ;AAC7B,uBAAa,oBAAoB,SAAS,OAAO;AACjD,iBAAO,oBAAoB,SAAS,OAAO;AAAA,QAC7C;AAEA,cAAM,QAAQ,WAAW,MAAM;AAC7B,kBAAQ;AACR,iBAAO,IAAI,MAAM,yCAAyC,OAAO,EAAE,CAAC;AAAA,QACtE,GAAG,SAAS;AAEZ,YAAI,aAAa,SAAS;AAAE,kBAAQ;AAAG;AAAA,QAAQ;AAC/C,YAAI,OAAO,SAAS;AAAE,kBAAQ;AAAG;AAAA,QAAQ;AACzC,qBAAa,iBAAiB,SAAS,SAAS,EAAE,MAAM,KAAK,CAAC;AAC9D,eAAO,iBAAiB,SAAS,SAAS,EAAE,MAAM,KAAK,CAAC;AACxD,aAAK,GAAG,YAAY,QAAQ;AAAA,MAC9B,CAAC;AAAA,IACH;AAEA,WAAO;AAAA,MACL;AAAA,MAEA,MAAM,UAA2B;AAC/B,YAAI,aAAa,OAAW,OAAM;AAClC,eAAO,KAAK,QAAQ;AAAA,MACtB;AAAA,MAEA,MAAM,SAAS,QAAkC;AAC/C,YAAI,aAAa,OAAW,OAAM;AAClC,eAAO,KAAK,SAAS,MAAM;AAAA,MAC7B;AAAA,MAEA,MAAM,QAAuB;AAC3B,aAAK,IAAI,YAAY,UAAU;AAC/B,cAAM,cAAc,IAAI;AAAA,MAC1B;AAAA,IACF;AAAA,EACF;AACF;AAGA,eAAsB,iBACpB,SACA,QACyF;AACzF,QAAM,MAAM,MAAM,QAAQ,QAAQ,CAAC,MAAM,CAAC;AAC1C,QAAM,UAAU,IACb,OAAO,CAAC,MAAM,WAAW,EAAE,IAAI,CAAC,EAChC,IAAI,CAAC,OAAO,EAAE,MAAM,EAAE,MAAM,OAAO,EAAE,MAAM,EAAE;AAGhD,SAAO,EAAE,SAAS,IAAI,KAAK;AAC7B;","names":[]}
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
import {
|
|
2
|
+
BrowserClientImpl,
|
|
3
|
+
BrowserHttpClient,
|
|
4
|
+
captureCfCookies
|
|
5
|
+
} from "./chunk-D5G3S2VW.js";
|
|
6
|
+
import "./chunk-7IERJRIL.js";
|
|
7
|
+
export {
|
|
8
|
+
BrowserClientImpl,
|
|
9
|
+
BrowserHttpClient,
|
|
10
|
+
captureCfCookies
|
|
11
|
+
};
|
|
12
|
+
//# sourceMappingURL=client-OCC6BJIO.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":[],"sourcesContent":[],"mappings":"","names":[]}
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
import {
|
|
2
|
+
AdapterNotFoundError,
|
|
3
|
+
BrowserModuleNotInstalledError,
|
|
4
|
+
CancelledError,
|
|
5
|
+
ChallengeUnresolvedError,
|
|
6
|
+
ChapterFetchError,
|
|
7
|
+
DownloadError,
|
|
8
|
+
HttpError,
|
|
9
|
+
ParseError,
|
|
10
|
+
RateLimitedError,
|
|
11
|
+
TimeoutError
|
|
12
|
+
} from "./chunk-7IERJRIL.js";
|
|
13
|
+
export {
|
|
14
|
+
AdapterNotFoundError,
|
|
15
|
+
BrowserModuleNotInstalledError,
|
|
16
|
+
CancelledError,
|
|
17
|
+
ChallengeUnresolvedError,
|
|
18
|
+
ChapterFetchError,
|
|
19
|
+
DownloadError,
|
|
20
|
+
HttpError,
|
|
21
|
+
ParseError,
|
|
22
|
+
RateLimitedError,
|
|
23
|
+
TimeoutError
|
|
24
|
+
};
|
|
25
|
+
//# sourceMappingURL=errors-UKZDQ6Y3.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"sources":[],"sourcesContent":[],"mappings":"","names":[]}
|