@se-studio/site-check 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,14 @@
1
+ # Changelog
2
+
3
+ ## 1.0.1
4
+
5
+ ### Patch Changes
6
+
7
+ - Bulk version bump: patch for all packages
8
+
9
+ ## 1.0.0
10
+
11
+ - Initial release.
12
+ - Validate sitemap.xml, sitemap-unindexed.xml, and llms.txt.
13
+ - Download all markdown files preserving site structure.
14
+ - Options: `-o`/`--out`, `-H`/`--header`, `--vercel-bypass [secret]` for Vercel Deployment Protection.
package/README.md ADDED
@@ -0,0 +1,76 @@
1
+ # @se-studio/site-check
2
+
3
+ Validate SE marketing sites (sitemap.xml, sitemap-unindexed.xml, llms.txt) and download all markdown files into a local directory, preserving URL path structure.
4
+
5
+ ## Usage
6
+
7
+ ```bash
8
+ npx @se-studio/site-check <baseUrl> [-o dir] [-H "Name: value"] [--vercel-bypass [secret]] [--check-against <devUrl>]
9
+ ```
10
+
11
+ ### Examples
12
+
13
+ ```bash
14
+ # Local site
15
+ npx @se-studio/site-check http://localhost:3015
16
+
17
+ # Custom output directory
18
+ npx @se-studio/site-check http://localhost:3015 -o ./out
19
+
20
+ # Compare production vs development: use production sitemap, check pages exist on dev
21
+ npx @se-studio/site-check https://example.com --check-against http://localhost:3010
22
+
23
+ # Vercel Deployment Protection: use env secret
24
+ VERCEL_AUTOMATION_BYPASS_SECRET=your-secret npx @se-studio/site-check https://preview.vercel.app --vercel-bypass
25
+
26
+ # Vercel bypass with explicit secret
27
+ npx @se-studio/site-check https://preview.vercel.app --vercel-bypass your-secret
28
+
29
+ # Custom headers
30
+ npx @se-studio/site-check https://example.com -H "Authorization: Bearer token" -H "X-Custom: value"
31
+ ```
32
+
33
+ ## What it does
34
+
35
+ 1. **Validates** the site by fetching:
36
+ - `sitemap.xml` (required) — must return 200 and contain `<loc>` entries; if it is a sitemap index, child sitemaps are fetched and must return valid urlset content
37
+ - `sitemap-unindexed.xml` (optional) — if 404, a warning is printed and the run continues
38
+ - `llms.txt` (required) — must return 200
39
+
40
+ 2. **Collects markdown URLs** from **sitemap.xml** and **sitemap-unindexed.xml** (Option B): when either URL returns a **sitemap index** (e.g. `<sitemapindex>` with child `<sitemap><loc>...</loc></sitemap>`), the tool follows those links and collects page URLs from each child sitemap’s urlset (one level only). When the response is a normal urlset, every `<loc>` is treated as a page URL. Each page URL is then converted to a `.md` URL (path + `.md`). This ensures all indexed and unindexed pages are checked.
41
+
42
+ 3. **Checks for unexpected rewrites**: For each page URL from the sitemaps, the tool sends a HEAD request and reads the `x-nextjs-rewritten-path` response header. If the header is present and the rewritten path is different from the requested path (after normalisation), the run **fails** (exit 1). This catches cases where the sitemap lists one URL (e.g. `/learning-hub/blog/`) but the app rewrites to another (e.g. `/articles/blog`). Rewrites that only add or remove a locale prefix are allowed when `SITE_CHECK_LOCALES` is set (see [Options](#options)).
43
+
44
+ 4. **Checks and downloads**: For each markdown URL, the tool fetches it. If any return non-2xx, the run **fails** (exit 1) and reports which URLs are missing or errored. Otherwise it saves each response to the output directory, preserving path structure (e.g. `blog/foo.md` → `./markdown-export/blog/foo.md`, `es-US/about.md` → `./markdown-export/es-US/about.md`).
45
+
46
+ ### Compare mode (`--check-against`)
47
+
48
+ When `--check-against <devUrl>` is set, the tool runs in **compare mode**:
49
+
50
+ - **Production** (the `<baseUrl>`) is the source of the URL list: sitemap.xml and sitemap-unindexed.xml are fetched from production. Validation (and llms.txt) applies to production.
51
+ - **Development** (`<devUrl>`) is the site checked: each production sitemap page URL is mapped to the same path on the development origin. The tool then checks that each of those development URLs returns 2xx (HEAD) and runs the rewrite check against the development site. Markdown collection and download are **not** performed in compare mode.
52
+
53
+ Use this to ensure a development or staging build has the same pages as production (e.g. before release). Page title comparison is not yet implemented; a future option may add it.
54
+
55
+ ## Options
56
+
57
+ | Option | Description |
58
+ |--------|-------------|
59
+ | `baseUrl` | Base URL of the site (required). In compare mode this is the **production** URL (sitemap source). Trailing slash is stripped. |
60
+ | `-o`, `--out <dir>` | Output directory for markdown files (default: `./markdown-export`). Ignored in compare mode. |
61
+ | `-H`, `--header "Name: value"` | Add a request header (repeatable). Applied to both production and development requests in compare mode. |
62
+ | `--check-against <devUrl>` | Compare mode: use production sitemap(s) but check that each page exists on development (same path on `<devUrl>`). No markdown download. |
63
+ | `--vercel-bypass [secret]` | Set `x-vercel-protection-bypass` for [Vercel Deployment Protection](https://vercel.com/docs/deployment-protection/methods-to-bypass-deployment-protection/protection-bypass-automation). If `secret` is omitted, uses `VERCEL_AUTOMATION_BYPASS_SECRET` from the environment. |
64
+
65
+ Headers from `-H` override the Vercel bypass header if the same name is used.
66
+
67
+ ### Environment
68
+
69
+ | Variable | Description |
70
+ |----------|-------------|
71
+ | `SITE_CHECK_LOCALES` | Comma-separated list of locale path segments (e.g. `en,en-gb,de`). When set, a rewrite is considered acceptable if the only difference between the requested path and the rewritten path is a leading locale segment. If unset, any rewrite to a different path fails. |
72
+
73
+ ## Exit codes
74
+
75
+ - `0` — Validation passed, no unexpected rewrites, every sitemap page returned 200 for its `.md` URL, and files were saved (or in compare mode: all production sitemap pages exist on development and no unexpected rewrites).
76
+ - `1` — Usage error (missing baseUrl), validation failed (sitemap.xml or llms.txt), one or more sitemap pages rewrite to a different path, one or more markdown URLs returned non-2xx (pages missing or broken), or in compare mode one or more development URLs did not return 2xx.
@@ -0,0 +1,19 @@
1
+ /**
2
+ * Check that each page URL returns 2xx (e.g. HEAD). Used in compare mode to verify
3
+ * development has the same pages as production sitemap.
4
+ */
5
+ interface PageExistError {
6
+ url: string;
7
+ status: number;
8
+ message: string;
9
+ }
10
+ interface CheckPagesExistResult {
11
+ ok: boolean;
12
+ errors: PageExistError[];
13
+ }
14
+ /**
15
+ * For each page URL, send HEAD request. Require 2xx. Return errors for non-2xx.
16
+ */
17
+ export declare function checkPagesExist(pageUrls: string[], fetchInit: RequestInit): Promise<CheckPagesExistResult>;
18
+ export {};
19
+ //# sourceMappingURL=check-pages-exist.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"check-pages-exist.d.ts","sourceRoot":"","sources":["../src/check-pages-exist.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,UAAU,cAAc;IACtB,GAAG,EAAE,MAAM,CAAC;IACZ,MAAM,EAAE,MAAM,CAAC;IACf,OAAO,EAAE,MAAM,CAAC;CACjB;AAED,UAAU,qBAAqB;IAC7B,EAAE,EAAE,OAAO,CAAC;IACZ,MAAM,EAAE,cAAc,EAAE,CAAC;CAC1B;AAED;;GAEG;AACH,wBAAsB,eAAe,CACnC,QAAQ,EAAE,MAAM,EAAE,EAClB,SAAS,EAAE,WAAW,GACrB,OAAO,CAAC,qBAAqB,CAAC,CAiBhC"}
@@ -0,0 +1,25 @@
1
+ /**
2
+ * Check that each page URL returns 2xx (e.g. HEAD). Used in compare mode to verify
3
+ * development has the same pages as production sitemap.
4
+ */
5
+ /**
6
+ * For each page URL, send HEAD request. Require 2xx. Return errors for non-2xx.
7
+ */
8
+ export async function checkPagesExist(pageUrls, fetchInit) {
9
+ const errors = [];
10
+ for (const url of pageUrls) {
11
+ const res = await fetch(url, { ...fetchInit, method: 'HEAD' });
12
+ if (res.ok)
13
+ continue;
14
+ errors.push({
15
+ url,
16
+ status: res.status,
17
+ message: `HTTP ${res.status}`,
18
+ });
19
+ }
20
+ return {
21
+ ok: errors.length === 0,
22
+ errors,
23
+ };
24
+ }
25
+ //# sourceMappingURL=check-pages-exist.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"check-pages-exist.js","sourceRoot":"","sources":["../src/check-pages-exist.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAaH;;GAEG;AACH,MAAM,CAAC,KAAK,UAAU,eAAe,CACnC,QAAkB,EAClB,SAAsB;IAEtB,MAAM,MAAM,GAAqB,EAAE,CAAC;IAEpC,KAAK,MAAM,GAAG,IAAI,QAAQ,EAAE,CAAC;QAC3B,MAAM,GAAG,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE,EAAE,GAAG,SAAS,EAAE,MAAM,EAAE,MAAM,EAAE,CAAC,CAAC;QAC/D,IAAI,GAAG,CAAC,EAAE;YAAE,SAAS;QACrB,MAAM,CAAC,IAAI,CAAC;YACV,GAAG;YACH,MAAM,EAAE,GAAG,CAAC,MAAM;YAClB,OAAO,EAAE,QAAQ,GAAG,CAAC,MAAM,EAAE;SAC9B,CAAC,CAAC;IACL,CAAC;IAED,OAAO;QACL,EAAE,EAAE,MAAM,CAAC,MAAM,KAAK,CAAC;QACvB,MAAM;KACP,CAAC;AACJ,CAAC"}
@@ -0,0 +1,21 @@
1
+ /**
2
+ * Verify that sitemap page URLs do not rewrite to different paths (except localisation).
3
+ * Fetches each page URL and checks the x-nextjs-rewritten-path response header.
4
+ */
5
+ interface RewriteCheckError {
6
+ url: string;
7
+ requestedPath: string;
8
+ rewrittenPath: string;
9
+ }
10
+ interface RewriteCheckResult {
11
+ ok: boolean;
12
+ errors: RewriteCheckError[];
13
+ }
14
+ /**
15
+ * For each page URL, fetch with HEAD and check x-nextjs-rewritten-path.
16
+ * If the header is present and the rewritten path is not equivalent to the requested path
17
+ * (after normalisation and optional locale stripping), record an error.
18
+ */
19
+ export declare function checkRewrites(pageUrls: string[], fetchInit: RequestInit): Promise<RewriteCheckResult>;
20
+ export {};
21
+ //# sourceMappingURL=check-rewrites.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"check-rewrites.d.ts","sourceRoot":"","sources":["../src/check-rewrites.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAuCH,UAAU,iBAAiB;IACzB,GAAG,EAAE,MAAM,CAAC;IACZ,aAAa,EAAE,MAAM,CAAC;IACtB,aAAa,EAAE,MAAM,CAAC;CACvB;AAED,UAAU,kBAAkB;IAC1B,EAAE,EAAE,OAAO,CAAC;IACZ,MAAM,EAAE,iBAAiB,EAAE,CAAC;CAC7B;AAED;;;;GAIG;AACH,wBAAsB,aAAa,CACjC,QAAQ,EAAE,MAAM,EAAE,EAClB,SAAS,EAAE,WAAW,GACrB,OAAO,CAAC,kBAAkB,CAAC,CA0B7B"}
@@ -0,0 +1,69 @@
1
+ /**
2
+ * Verify that sitemap page URLs do not rewrite to different paths (except localisation).
3
+ * Fetches each page URL and checks the x-nextjs-rewritten-path response header.
4
+ */
5
+ const REWRITTEN_PATH_HEADER = 'x-nextjs-rewritten-path';
6
+ const LOCALES_ENV = 'SITE_CHECK_LOCALES';
7
+ function getLocalePrefixes() {
8
+ const raw = process.env[LOCALES_ENV];
9
+ if (!raw || typeof raw !== 'string')
10
+ return [];
11
+ return raw
12
+ .split(',')
13
+ .map((s) => s.trim().toLowerCase())
14
+ .filter(Boolean);
15
+ }
16
+ /**
17
+ * Normalise path for comparison: strip trailing slash, lowercase.
18
+ * If localePrefixes are given, strip a leading segment that matches a locale.
19
+ */
20
+ function normalisePath(pathname, localePrefixes) {
21
+ let path = pathname.replace(/\/$/, '') || '/';
22
+ if (path !== '/' && localePrefixes.length > 0) {
23
+ const segments = path.split('/').filter(Boolean);
24
+ const first = segments[0];
25
+ if (first && localePrefixes.includes(first.toLowerCase())) {
26
+ segments.shift();
27
+ path = segments.length > 0 ? `/${segments.join('/')}` : '/';
28
+ }
29
+ }
30
+ return path;
31
+ }
32
+ function pathnameFromUrl(url) {
33
+ try {
34
+ return new URL(url).pathname;
35
+ }
36
+ catch {
37
+ return '/';
38
+ }
39
+ }
40
+ /**
41
+ * For each page URL, fetch with HEAD and check x-nextjs-rewritten-path.
42
+ * If the header is present and the rewritten path is not equivalent to the requested path
43
+ * (after normalisation and optional locale stripping), record an error.
44
+ */
45
+ export async function checkRewrites(pageUrls, fetchInit) {
46
+ const localePrefixes = getLocalePrefixes();
47
+ const errors = [];
48
+ for (const url of pageUrls) {
49
+ const res = await fetch(url, { ...fetchInit, method: 'HEAD' });
50
+ const rewritten = res.headers.get(REWRITTEN_PATH_HEADER);
51
+ if (!rewritten)
52
+ continue;
53
+ const requestedPath = pathnameFromUrl(url);
54
+ const requestedNorm = normalisePath(requestedPath, localePrefixes);
55
+ const rewrittenNorm = normalisePath(rewritten, localePrefixes);
56
+ if (requestedNorm !== rewrittenNorm) {
57
+ errors.push({
58
+ url,
59
+ requestedPath: requestedPath || '/',
60
+ rewrittenPath: rewritten,
61
+ });
62
+ }
63
+ }
64
+ return {
65
+ ok: errors.length === 0,
66
+ errors,
67
+ };
68
+ }
69
+ //# sourceMappingURL=check-rewrites.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"check-rewrites.js","sourceRoot":"","sources":["../src/check-rewrites.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,MAAM,qBAAqB,GAAG,yBAAyB,CAAC;AACxD,MAAM,WAAW,GAAG,oBAAoB,CAAC;AAEzC,SAAS,iBAAiB;IACxB,MAAM,GAAG,GAAG,OAAO,CAAC,GAAG,CAAC,WAAW,CAAC,CAAC;IACrC,IAAI,CAAC,GAAG,IAAI,OAAO,GAAG,KAAK,QAAQ;QAAE,OAAO,EAAE,CAAC;IAC/C,OAAO,GAAG;SACP,KAAK,CAAC,GAAG,CAAC;SACV,GAAG,CAAC,CAAC,CAAC,EAAE,EAAE,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC,WAAW,EAAE,CAAC;SAClC,MAAM,CAAC,OAAO,CAAC,CAAC;AACrB,CAAC;AAED;;;GAGG;AACH,SAAS,aAAa,CAAC,QAAgB,EAAE,cAAwB;IAC/D,IAAI,IAAI,GAAG,QAAQ,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,IAAI,GAAG,CAAC;IAC9C,IAAI,IAAI,KAAK,GAAG,IAAI,cAAc,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;QAC9C,MAAM,QAAQ,GAAG,IAAI,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC;QACjD,MAAM,KAAK,GAAG,QAAQ,CAAC,CAAC,CAAC,CAAC;QAC1B,IAAI,KAAK,IAAI,cAAc,CAAC,QAAQ,CAAC,KAAK,CAAC,WAAW,EAAE,CAAC,EAAE,CAAC;YAC1D,QAAQ,CAAC,KAAK,EAAE,CAAC;YACjB,IAAI,GAAG,QAAQ,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC,CAAC,IAAI,QAAQ,CAAC,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,CAAC,CAAC,GAAG,CAAC;QAC9D,CAAC;IACH,CAAC;IACD,OAAO,IAAI,CAAC;AACd,CAAC;AAED,SAAS,eAAe,CAAC,GAAW;IAClC,IAAI,CAAC;QACH,OAAO,IAAI,GAAG,CAAC,GAAG,CAAC,CAAC,QAAQ,CAAC;IAC/B,CAAC;IAAC,MAAM,CAAC;QACP,OAAO,GAAG,CAAC;IACb,CAAC;AACH,CAAC;AAaD;;;;GAIG;AACH,MAAM,CAAC,KAAK,UAAU,aAAa,CACjC,QAAkB,EAClB,SAAsB;IAEtB,MAAM,cAAc,GAAG,iBAAiB,EAAE,CAAC;IAC3C,MAAM,MAAM,GAAwB,EAAE,CAAC;IAEvC,KAAK,MAAM,GAAG,IAAI,QAAQ,EAAE,CAAC;QAC3B,MAAM,GAAG,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE,EAAE,GAAG,SAAS,EAAE,MAAM,EAAE,MAAM,EAAE,CAAC,CAAC;QAC/D,MAAM,SAAS,GAAG,GAAG,CAAC,OAAO,CAAC,GAAG,CAAC,qBAAqB,CAAC,CAAC;QACzD,IAAI,CAAC,SAAS;YAAE,SAAS;QAEzB,MAAM,aAAa,GAAG,eAAe,CAAC,GAAG,CAAC,CAAC;QAC3C,MAAM,aAAa,GAAG,aAAa,CAAC,aAAa,EAAE,cAAc,CAAC,CAAC;QACnE,MAAM,aAAa,GAAG,aAAa,CAAC,SAAS,EAAE,cAAc,CAAC,CAAC;QAE/D,IAAI,aAAa,KAAK,aAAa,EAAE,CAAC;YACpC,MAAM,CAAC,IAAI,CAAC;gBACV,GAAG;gBACH,aAAa,EAAE,aAAa,IAAI,GAAG;gBACnC,aAAa,EAAE,SAAS;aACzB,CAAC,CAAC;QACL,CAAC;IACH,CAAC;IAED,OAAO;QACL,EAAE,EAAE,MAAM,CAAC,MAAM,KAAK,CAAC;QACvB,MAAM;KACP,CAAC;AACJ,CAAC"}
package/dist/cli.d.ts ADDED
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env node
2
+ /** biome-ignore-all lint/suspicious/noConsole: CLI output is intentional */
3
+ export {};
4
+ //# sourceMappingURL=cli.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cli.d.ts","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":";AAEA,4EAA4E"}
package/dist/cli.js ADDED
@@ -0,0 +1,185 @@
1
+ #!/usr/bin/env node
2
+ /** biome-ignore-all lint/suspicious/noConsole: CLI output is intentional */
3
+ import { checkPagesExist } from './check-pages-exist.js';
4
+ import { checkRewrites } from './check-rewrites.js';
5
+ import { collectMarkdownUrlsFromSitemaps, mapProdPageUrlsToDev, pageUrlsFromSitemaps, } from './collect-markdown-urls.js';
6
+ import { downloadMarkdownFiles } from './download.js';
7
+ import { buildFetchOptions } from './fetch-options.js';
8
+ import { validate, validationFailed } from './validate.js';
9
+ const DEFAULT_OUT = './markdown-export';
10
+ function isValidAbsoluteUrl(url) {
11
+ try {
12
+ const u = new URL(url);
13
+ return u.protocol === 'http:' || u.protocol === 'https:';
14
+ }
15
+ catch {
16
+ return false;
17
+ }
18
+ }
19
+ function parseArgs(argv) {
20
+ const headerArgs = [];
21
+ let outDir = DEFAULT_OUT;
22
+ let vercelBypassRequested = false;
23
+ let vercelBypassSecret = null;
24
+ let baseUrl = null;
25
+ let checkAgainstUrl = null;
26
+ for (let i = 0; i < argv.length; i++) {
27
+ const arg = argv[i];
28
+ if (arg === '-o' || arg === '--out') {
29
+ const next = argv[i + 1];
30
+ if (next && !next.startsWith('-')) {
31
+ outDir = next;
32
+ i++;
33
+ }
34
+ }
35
+ else if (arg === '-H' || arg === '--header') {
36
+ const next = argv[i + 1];
37
+ if (next) {
38
+ headerArgs.push(next);
39
+ i++;
40
+ }
41
+ }
42
+ else if (arg === '--check-against') {
43
+ const next = argv[i + 1];
44
+ if (next && (next.startsWith('http://') || next.startsWith('https://'))) {
45
+ checkAgainstUrl = next.replace(/\/$/, '');
46
+ i++;
47
+ }
48
+ }
49
+ else if (arg === '--vercel-bypass') {
50
+ vercelBypassRequested = true;
51
+ const next = argv[i + 1];
52
+ if (next && !next.startsWith('-')) {
53
+ vercelBypassSecret = next;
54
+ i++;
55
+ }
56
+ }
57
+ else if (arg.startsWith('http://') || arg.startsWith('https://')) {
58
+ baseUrl = arg.replace(/\/$/, '');
59
+ }
60
+ }
61
+ return {
62
+ baseUrl,
63
+ checkAgainstUrl,
64
+ outDir,
65
+ headerArgs,
66
+ vercelBypassRequested,
67
+ vercelBypassSecret,
68
+ };
69
+ }
70
+ async function main() {
71
+ const args = parseArgs(process.argv.slice(2));
72
+ if (!args.baseUrl) {
73
+ console.error('Usage: site-check <baseUrl> [-o dir] [-H "Name: value"] [--vercel-bypass [secret]] [--check-against <devUrl>]');
74
+ console.error('Example: site-check http://localhost:3015 -o ./out');
75
+ console.error('Example (compare prod vs dev): site-check https://example.com --check-against http://localhost:3010');
76
+ return 1;
77
+ }
78
+ const fetchInit = buildFetchOptions({
79
+ headerArgs: args.headerArgs,
80
+ vercelBypassRequested: args.vercelBypassRequested,
81
+ vercelBypassSecret: args.vercelBypassSecret,
82
+ });
83
+ const isCompareMode = args.checkAgainstUrl !== null;
84
+ const prodUrl = args.baseUrl;
85
+ if (isCompareMode) {
86
+ console.log(`Comparing: production ${prodUrl} vs development ${args.checkAgainstUrl}`);
87
+ }
88
+ console.log('Validating…');
89
+ const validation = await validate(prodUrl, fetchInit);
90
+ const sitemapOk = validation.sitemapXml.ok ? 'OK' : `FAIL (${validation.sitemapXml.status})`;
91
+ console.log(` sitemap.xml … ${sitemapOk}`);
92
+ if (validation.sitemapUnindexedXml.warned) {
93
+ console.log(' sitemap-unindexed.xml … not found (optional, continuing)');
94
+ }
95
+ else {
96
+ const unindexedOk = validation.sitemapUnindexedXml.ok
97
+ ? 'OK'
98
+ : `FAIL (${validation.sitemapUnindexedXml.status})`;
99
+ console.log(` sitemap-unindexed.xml … ${unindexedOk}`);
100
+ }
101
+ const llmsOk = validation.llmsTxt.ok ? 'OK' : `FAIL (${validation.llmsTxt.status})`;
102
+ console.log(` llms.txt … ${llmsOk}`);
103
+ if (validationFailed(validation)) {
104
+ console.error('Validation failed. Fix sitemap.xml and/or llms.txt and try again.');
105
+ return 1;
106
+ }
107
+ if (isCompareMode) {
108
+ const devBaseUrl = args.checkAgainstUrl;
109
+ const prodPageUrls = pageUrlsFromSitemaps(validation);
110
+ const devPageUrls = mapProdPageUrlsToDev(prodPageUrls, devBaseUrl);
111
+ const indexNote = validation.sitemapWasIndex ? ' (including from sitemap index)' : '';
112
+ console.log(`Using sitemap from production: ${prodPageUrls.length} page(s) → checking ${devPageUrls.length} on development${indexNote}`);
113
+ if (devPageUrls.length === 0) {
114
+ console.log('No page URLs to check on development.');
115
+ return 0;
116
+ }
117
+ const validDevUrls = devPageUrls.filter(isValidAbsoluteUrl);
118
+ if (validDevUrls.length > 0) {
119
+ console.log(`Checking ${validDevUrls.length} page(s) exist on development…`);
120
+ const existResult = await checkPagesExist(validDevUrls, fetchInit);
121
+ if (!existResult.ok) {
122
+ console.error(`\n${existResult.errors.length} page(s) did not return 2xx on development:`);
123
+ for (const e of existResult.errors) {
124
+ console.error(` ${e.url} … ${e.message}`);
125
+ }
126
+ return 1;
127
+ }
128
+ console.log(`Checking ${validDevUrls.length} page(s) for unexpected rewrites…`);
129
+ const rewriteResult = await checkRewrites(validDevUrls, fetchInit);
130
+ if (!rewriteResult.ok) {
131
+ console.error(`\n${rewriteResult.errors.length} page(s) rewrite to a different path (x-nextjs-rewritten-path):`);
132
+ for (const e of rewriteResult.errors) {
133
+ console.error(` ${e.url}`);
134
+ console.error(` requested: ${e.requestedPath} → rewritten: ${e.rewrittenPath}`);
135
+ }
136
+ console.error('\nEnsure sitemap URLs match the app routes, or use path rewrites only for localisation.');
137
+ return 1;
138
+ }
139
+ }
140
+ console.log('All production sitemap pages exist on development.');
141
+ return 0;
142
+ }
143
+ console.log('Collecting markdown URLs from sitemaps…');
144
+ const urls = collectMarkdownUrlsFromSitemaps(args.baseUrl, validation);
145
+ const indexNote = validation.sitemapWasIndex ? ' (including from sitemap index)' : '';
146
+ console.log(` Found ${urls.length} URLs (sitemap.xml + sitemap-unindexed.xml)${indexNote}`);
147
+ if (urls.length === 0) {
148
+ console.log('No markdown URLs to check or download.');
149
+ return 0;
150
+ }
151
+ const pageUrls = pageUrlsFromSitemaps(validation);
152
+ const validPageUrls = pageUrls.filter(isValidAbsoluteUrl);
153
+ if (validPageUrls.length > 0) {
154
+ console.log(`Checking ${validPageUrls.length} sitemap page(s) for unexpected rewrites…`);
155
+ const rewriteResult = await checkRewrites(validPageUrls, fetchInit);
156
+ if (!rewriteResult.ok) {
157
+ console.error(`\n${rewriteResult.errors.length} page(s) rewrite to a different path (x-nextjs-rewritten-path):`);
158
+ for (const e of rewriteResult.errors) {
159
+ console.error(` ${e.url}`);
160
+ console.error(` requested: ${e.requestedPath} → rewritten: ${e.rewrittenPath}`);
161
+ }
162
+ console.error('\nEnsure sitemap URLs match the app routes, or use path rewrites only for localisation.');
163
+ return 1;
164
+ }
165
+ }
166
+ console.log(`Checking and downloading ${urls.length} markdown files…`);
167
+ const downloadResult = await downloadMarkdownFiles(urls, args.outDir, fetchInit);
168
+ if (downloadResult.errors.length > 0) {
169
+ console.error(`\n${downloadResult.errors.length} page(s) did not return 200 (markdown missing or error):`);
170
+ for (const e of downloadResult.errors) {
171
+ console.error(` ${e.url} … ${e.message}`);
172
+ }
173
+ console.error('\nFix the markdown routes or sitemap so every sitemap URL has a working .md version.');
174
+ return 1;
175
+ }
176
+ console.log(`Saved ${downloadResult.saved} files to ${args.outDir}`);
177
+ return 0;
178
+ }
179
+ main()
180
+ .then((code) => process.exit(code))
181
+ .catch((err) => {
182
+ console.error(err);
183
+ process.exit(1);
184
+ });
185
+ //# sourceMappingURL=cli.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"cli.js","sourceRoot":"","sources":["../src/cli.ts"],"names":[],"mappings":";AAEA,4EAA4E;AAE5E,OAAO,EAAE,eAAe,EAAE,MAAM,wBAAwB,CAAC;AACzD,OAAO,EAAE,aAAa,EAAE,MAAM,qBAAqB,CAAC;AACpD,OAAO,EACL,+BAA+B,EAC/B,oBAAoB,EACpB,oBAAoB,GACrB,MAAM,4BAA4B,CAAC;AACpC,OAAO,EAAE,qBAAqB,EAAE,MAAM,eAAe,CAAC;AACtD,OAAO,EAAE,iBAAiB,EAAE,MAAM,oBAAoB,CAAC;AACvD,OAAO,EAAE,QAAQ,EAAE,gBAAgB,EAAE,MAAM,eAAe,CAAC;AAE3D,MAAM,WAAW,GAAG,mBAAmB,CAAC;AAExC,SAAS,kBAAkB,CAAC,GAAW;IACrC,IAAI,CAAC;QACH,MAAM,CAAC,GAAG,IAAI,GAAG,CAAC,GAAG,CAAC,CAAC;QACvB,OAAO,CAAC,CAAC,QAAQ,KAAK,OAAO,IAAI,CAAC,CAAC,QAAQ,KAAK,QAAQ,CAAC;IAC3D,CAAC;IAAC,MAAM,CAAC;QACP,OAAO,KAAK,CAAC;IACf,CAAC;AACH,CAAC;AAWD,SAAS,SAAS,CAAC,IAAc;IAC/B,MAAM,UAAU,GAAa,EAAE,CAAC;IAChC,IAAI,MAAM,GAAG,WAAW,CAAC;IACzB,IAAI,qBAAqB,GAAG,KAAK,CAAC;IAClC,IAAI,kBAAkB,GAAkB,IAAI,CAAC;IAC7C,IAAI,OAAO,GAAkB,IAAI,CAAC;IAClC,IAAI,eAAe,GAAkB,IAAI,CAAC;IAE1C,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;QACrC,MAAM,GAAG,GAAG,IAAI,CAAC,CAAC,CAAC,CAAC;QACpB,IAAI,GAAG,KAAK,IAAI,IAAI,GAAG,KAAK,OAAO,EAAE,CAAC;YACpC,MAAM,IAAI,GAAG,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;YACzB,IAAI,IAAI,IAAI,CAAC,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;gBAClC,MAAM,GAAG,IAAI,CAAC;gBACd,CAAC,EAAE,CAAC;YACN,CAAC;QACH,CAAC;aAAM,IAAI,GAAG,KAAK,IAAI,IAAI,GAAG,KAAK,UAAU,EAAE,CAAC;YAC9C,MAAM,IAAI,GAAG,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;YACzB,IAAI,IAAI,EAAE,CAAC;gBACT,UAAU,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;gBACtB,CAAC,EAAE,CAAC;YACN,CAAC;QACH,CAAC;aAAM,IAAI,GAAG,KAAK,iBAAiB,EAAE,CAAC;YACrC,MAAM,IAAI,GAAG,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;YACzB,IAAI,IAAI,IAAI,CAAC,IAAI,CAAC,UAAU,CAAC,SAAS,CAAC,IAAI,IAAI,CAAC,UAAU,CAAC,UAAU,CAAC,CAAC,EAAE,CAAC;gBACxE,eAAe,GAAG,IAAI,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;gBAC1C,CAAC,EAAE,CAAC;YACN,CAAC;QACH,CAAC;aAAM,IAAI,GAAG,KAAK,iBAAiB,EAAE,CAAC;YACrC,qBAAqB,GAAG,IAAI,CAAC;YAC7B,MAAM,IAAI,GAAG,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;YACzB,IAAI,IAAI,IAAI,CAAC,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;gBAClC,kBAAkB,GAAG,IAAI,CAAC;gBAC1B,CAAC,EAAE,CAAC;YACN,CAAC;QACH,CAAC;aAAM,IAAI,GAAG,CAAC,UAAU,CAAC,SAAS,CAAC,IAAI,GAAG,CAAC,UAAU,CAAC,UAAU,CAAC,EAAE,CAAC;YACnE,OAAO,GAAG,GAAG,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;QACnC,CAAC;IACH,CAAC;IAED,OAAO;QACL,OAAO;QACP,eAAe;QACf,MAAM;QACN,UAAU;QACV,qBAAqB;QACrB,kBAAkB;KACnB,CAAC;AACJ,CAAC;AAED,KAAK,UAAU,IAAI;IACjB,MAAM,IAAI,GAAG,SAAS,CAAC,OAAO,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC,CAAC,CAAC,CAAC;IAE9C,IAAI,CAAC,IAAI,CAAC,OAAO,EAAE,CAAC;QAClB,OAAO,CAAC,KAAK,CACX,+GAA+G,CAChH,CAAC;QACF,OAAO,CAAC,KAAK,CAAC,oDAAoD,CAAC,CAAC;QACpE,OAAO,CAAC,KAAK,CACX,qGAAqG,CACtG,CAAC;QACF,OAAO,CAAC,CAAC;IACX,CAAC;IAED,MAAM,SAAS,GAAG,iBAAiB,CAAC;QAClC,UAAU,EAAE,IAAI,CAAC,UAAU;QAC3B,qBAAqB,EAAE,IAAI,CAAC,qBAAqB;QACjD,kBAAkB,EAAE,IAAI,CAAC,kBAAkB;KAC5C,CAAC,CAAC;IAEH,MAAM,aAAa,GAAG,IAAI,CAAC,eAAe,KAAK,IAAI,CAAC;IACpD,MAAM,OAAO,GAAG,IAAI,CAAC,OAAO,CAAC;IAE7B,IAAI,aAAa,EAAE,CAAC;QAClB,OAAO,CAAC,GAAG,CAAC,yBAAyB,OAAO,mBAAmB,IAAI,CAAC,eAAe,EAAE,CAAC,CAAC;IACzF,CAAC;IAED,OAAO,CAAC,GAAG,CAAC,aAAa,CAAC,CAAC;IAC3B,MAAM,UAAU,GAAG,MAAM,QAAQ,CAAC,OAAO,EAAE,SAAS,CAAC,CAAC;IAEtD,MAAM,SAAS,GAAG,UAAU,CAAC,UAAU,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,CAAC,SAAS,UAAU,CAAC,UAAU,CAAC,MAAM,GAAG,CAAC;IAC7F,OAAO,CAAC,GAAG,CAAC,mBAAmB,SAAS,EAAE,CAAC,CAAC;IAE5C,IAAI,UAAU,CAAC,mBAAmB,CAAC,MAAM,EAAE,CAAC;QAC1C,OAAO,CAAC,GAAG,CAAC,4DAA4D,CAAC,CAAC;IAC5E,CAAC;SAAM,CAAC;QACN,MAAM,WAAW,GAAG,UAAU,CAAC,mBAAmB,CAAC,EAAE;YACnD,CAAC,CAAC,IAAI;YACN,CAAC,CAAC,SAAS,UAAU,CAAC,mBAAmB,CAAC,MAAM,GAAG,CAAC;QACtD,OAAO,CAAC,GAAG,CAAC,6BAA6B,WAAW,EAAE,CAAC,CAAC;IAC1D,CAAC;IAED,MAAM,MAAM,GAAG,UAAU,CAAC,OAAO,CAAC,EAAE,CAAC,CAAC,CAAC,IAAI,CAAC,CAAC,CAAC,SAAS,UAAU,CAAC,OAAO,CAAC,MAAM,GAAG,CAAC;IACpF,OAAO,CAAC,GAAG,CAAC,gBAAgB,MAAM,EAAE,CAAC,CAAC;IAEtC,IAAI,gBAAgB,CAAC,UAAU,CAAC,EAAE,CAAC;QACjC,OAAO,CAAC,KAAK,CAAC,mEAAmE,CAAC,CAAC;QACnF,OAAO,CAAC,CAAC;IACX,CAAC;IAED,IAAI,aAAa,EAAE,CAAC;QAClB,MAAM,UAAU,GAAG,IAAI,CAAC,eAAyB,CAAC;QAClD,MAAM,YAAY,GAAG,oBAAoB,CAAC,UAAU,CAAC,CAAC;QACtD,MAAM,WAAW,GAAG,oBAAoB,CAAC,YAAY,EAAE,UAAU,CAAC,CAAC;QACnE,MAAM,SAAS,GAAG,UAAU,CAAC,eAAe,CAAC,CAAC,CAAC,iCAAiC,CAAC,CAAC,CAAC,EAAE,CAAC;QACtF,OAAO,CAAC,GAAG,CACT,kCAAkC,YAAY,CAAC,MAAM,uBAAuB,WAAW,CAAC,MAAM,kBAAkB,SAAS,EAAE,CAC5H,CAAC;QACF,IAAI,WAAW,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YAC7B,OAAO,CAAC,GAAG,CAAC,uCAAuC,CAAC,CAAC;YACrD,OAAO,CAAC,CAAC;QACX,CAAC;QACD,MAAM,YAAY,GAAG,WAAW,CAAC,MAAM,CAAC,kBAAkB,CAAC,CAAC;QAC5D,IAAI,YAAY,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAC5B,OAAO,CAAC,GAAG,CAAC,YAAY,YAAY,CAAC,MAAM,gCAAgC,CAAC,CAAC;YAC7E,MAAM,WAAW,GAAG,MAAM,eAAe,CAAC,YAAY,EAAE,SAAS,CAAC,CAAC;YACnE,IAAI,CAAC,WAAW,CAAC,EAAE,EAAE,CAAC;gBACpB,OAAO,CAAC,KAAK,CAAC,KAAK,WAAW,CAAC,MAAM,CAAC,MAAM,6CAA6C,CAAC,CAAC;gBAC3F,KAAK,MAAM,CAAC,IAAI,WAAW,CAAC,MAAM,EAAE,CAAC;oBACnC,OAAO,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,GAAG,MAAM,CAAC,CAAC,OAAO,EAAE,CAAC,CAAC;gBAC7C,CAAC;gBACD,OAAO,CAAC,CAAC;YACX,CAAC;YACD,OAAO,CAAC,GAAG,CAAC,YAAY,YAAY,CAAC,MAAM,mCAAmC,CAAC,CAAC;YAChF,MAAM,aAAa,GAAG,MAAM,aAAa,CAAC,YAAY,EAAE,SAAS,CAAC,CAAC;YACnE,IAAI,CAAC,aAAa,CAAC,EAAE,EAAE,CAAC;gBACtB,OAAO,CAAC,KAAK,CACX,KAAK,aAAa,CAAC,MAAM,CAAC,MAAM,iEAAiE,CAClG,CAAC;gBACF,KAAK,MAAM,CAAC,IAAI,aAAa,CAAC,MAAM,EAAE,CAAC;oBACrC,OAAO,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,GAAG,EAAE,CAAC,CAAC;oBAC5B,OAAO,CAAC,KAAK,CAAC,kBAAkB,CAAC,CAAC,aAAa,iBAAiB,CAAC,CAAC,aAAa,EAAE,CAAC,CAAC;gBACrF,CAAC;gBACD,OAAO,CAAC,KAAK,CACX,yFAAyF,CAC1F,CAAC;gBACF,OAAO,CAAC,CAAC;YACX,CAAC;QACH,CAAC;QACD,OAAO,CAAC,GAAG,CAAC,oDAAoD,CAAC,CAAC;QAClE,OAAO,CAAC,CAAC;IACX,CAAC;IAED,OAAO,CAAC,GAAG,CAAC,yCAAyC,CAAC,CAAC;IACvD,MAAM,IAAI,GAAG,+BAA+B,CAAC,IAAI,CAAC,OAAO,EAAE,UAAU,CAAC,CAAC;IACvE,MAAM,SAAS,GAAG,UAAU,CAAC,eAAe,CAAC,CAAC,CAAC,iCAAiC,CAAC,CAAC,CAAC,EAAE,CAAC;IACtF,OAAO,CAAC,GAAG,CAAC,WAAW,IAAI,CAAC,MAAM,8CAA8C,SAAS,EAAE,CAAC,CAAC;IAE7F,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QACtB,OAAO,CAAC,GAAG,CAAC,wCAAwC,CAAC,CAAC;QACtD,OAAO,CAAC,CAAC;IACX,CAAC;IAED,MAAM,QAAQ,GAAG,oBAAoB,CAAC,UAAU,CAAC,CAAC;IAClD,MAAM,aAAa,GAAG,QAAQ,CAAC,MAAM,CAAC,kBAAkB,CAAC,CAAC;IAC1D,IAAI,aAAa,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;QAC7B,OAAO,CAAC,GAAG,CAAC,YAAY,aAAa,CAAC,MAAM,2CAA2C,CAAC,CAAC;QACzF,MAAM,aAAa,GAAG,MAAM,aAAa,CAAC,aAAa,EAAE,SAAS,CAAC,CAAC;QACpE,IAAI,CAAC,aAAa,CAAC,EAAE,EAAE,CAAC;YACtB,OAAO,CAAC,KAAK,CACX,KAAK,aAAa,CAAC,MAAM,CAAC,MAAM,iEAAiE,CAClG,CAAC;YACF,KAAK,MAAM,CAAC,IAAI,aAAa,CAAC,MAAM,EAAE,CAAC;gBACrC,OAAO,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,GAAG,EAAE,CAAC,CAAC;gBAC5B,OAAO,CAAC,KAAK,CAAC,kBAAkB,CAAC,CAAC,aAAa,iBAAiB,CAAC,CAAC,aAAa,EAAE,CAAC,CAAC;YACrF,CAAC;YACD,OAAO,CAAC,KAAK,CACX,yFAAyF,CAC1F,CAAC;YACF,OAAO,CAAC,CAAC;QACX,CAAC;IACH,CAAC;IAED,OAAO,CAAC,GAAG,CAAC,4BAA4B,IAAI,CAAC,MAAM,kBAAkB,CAAC,CAAC;IACvE,MAAM,cAAc,GAAG,MAAM,qBAAqB,CAAC,IAAI,EAAE,IAAI,CAAC,MAAM,EAAE,SAAS,CAAC,CAAC;IAEjF,IAAI,cAAc,CAAC,MAAM,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;QACrC,OAAO,CAAC,KAAK,CACX,KAAK,cAAc,CAAC,MAAM,CAAC,MAAM,0DAA0D,CAC5F,CAAC;QACF,KAAK,MAAM,CAAC,IAAI,cAAc,CAAC,MAAM,EAAE,CAAC;YACtC,OAAO,CAAC,KAAK,CAAC,KAAK,CAAC,CAAC,GAAG,MAAM,CAAC,CAAC,OAAO,EAAE,CAAC,CAAC;QAC7C,CAAC;QACD,OAAO,CAAC,KAAK,CACX,sFAAsF,CACvF,CAAC;QACF,OAAO,CAAC,CAAC;IACX,CAAC;IAED,OAAO,CAAC,GAAG,CAAC,SAAS,cAAc,CAAC,KAAK,aAAa,IAAI,CAAC,MAAM,EAAE,CAAC,CAAC;IACrE,OAAO,CAAC,CAAC;AACX,CAAC;AAED,IAAI,EAAE;KACH,IAAI,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC,OAAO,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;KAClC,KAAK,CAAC,CAAC,GAAG,EAAE,EAAE;IACb,OAAO,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC;IACnB,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;AAClB,CAAC,CAAC,CAAC"}
@@ -0,0 +1,25 @@
1
+ /**
2
+ * Collect markdown URLs from sitemap.xml and sitemap-unindexed.xml (Option B).
3
+ * Every <loc> in both sitemaps is converted to a .md URL; we then check each returns 200.
4
+ * Download paths and static assets (e.g. PDFs) are excluded — they have no markdown version.
5
+ */
6
+ import type { ValidationResult } from './validate.js';
7
+ /**
8
+ * Collect all page URLs from sitemap.xml and sitemap-unindexed.xml (raw <loc> values).
9
+ * Uses resolved page URL lists when present (e.g. after following sitemap index links).
10
+ * Used for rewrite checks so every sitemap URL is verified to not rewrite to a different path.
11
+ */
12
+ export declare function pageUrlsFromSitemaps(validationResult: ValidationResult): string[];
13
+ /**
14
+ * Collect all markdown URLs from sitemap.xml and sitemap-unindexed.xml (Option B).
15
+ * Uses resolved page URL lists when present (e.g. after following sitemap index links).
16
+ * Ensures every sitemap page has a corresponding markdown URL to validate.
17
+ */
18
+ export declare function collectMarkdownUrlsFromSitemaps(baseUrl: string, validationResult: ValidationResult): string[];
19
+ /**
20
+ * Map production page URLs to development page URLs (same path, dev origin).
21
+ * Excludes non-page assets (download paths, static file extensions).
22
+ * Used in compare mode: sitemap from prod, check existence on dev.
23
+ */
24
+ export declare function mapProdPageUrlsToDev(prodPageUrls: string[], devBaseUrl: string): string[];
25
+ //# sourceMappingURL=collect-markdown-urls.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"collect-markdown-urls.d.ts","sourceRoot":"","sources":["../src/collect-markdown-urls.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,OAAO,KAAK,EAAE,gBAAgB,EAAE,MAAM,eAAe,CAAC;AA+CtD;;;;GAIG;AACH,wBAAgB,oBAAoB,CAAC,gBAAgB,EAAE,gBAAgB,GAAG,MAAM,EAAE,CAajF;AAiCD;;;;GAIG;AACH,wBAAgB,+BAA+B,CAC7C,OAAO,EAAE,MAAM,EACf,gBAAgB,EAAE,gBAAgB,GACjC,MAAM,EAAE,CAGV;AAiBD;;;;GAIG;AACH,wBAAgB,oBAAoB,CAAC,YAAY,EAAE,MAAM,EAAE,EAAE,UAAU,EAAE,MAAM,GAAG,MAAM,EAAE,CAQzF"}
@@ -0,0 +1,144 @@
1
+ /**
2
+ * Collect markdown URLs from sitemap.xml and sitemap-unindexed.xml (Option B).
3
+ * Every <loc> in both sitemaps is converted to a .md URL; we then check each returns 200.
4
+ * Download paths and static assets (e.g. PDFs) are excluded — they have no markdown version.
5
+ */
6
+ /** Path segment extensions for static assets; these URLs are excluded from markdown checks. */
7
+ const STATIC_EXTENSIONS = new Set([
8
+ '.pdf',
9
+ '.zip',
10
+ '.doc',
11
+ '.docx',
12
+ '.xls',
13
+ '.xlsx',
14
+ '.ppt',
15
+ '.pptx',
16
+ '.csv',
17
+ '.txt',
18
+ '.rtf',
19
+ '.odt',
20
+ '.ods',
21
+ '.odp',
22
+ '.epub',
23
+ '.mobi',
24
+ ]);
25
+ /**
26
+ * True if this sitemap URL should not be requested as markdown (download path or static file).
27
+ */
28
+ function isNonPageAssetUrl(url) {
29
+ try {
30
+ const u = new URL(url);
31
+ const path = u.pathname.replace(/\/$/, '') || '/';
32
+ if (path.startsWith('/download/'))
33
+ return true;
34
+ const lastDot = path.lastIndexOf('.');
35
+ if (lastDot === -1)
36
+ return false;
37
+ return STATIC_EXTENSIONS.has(path.slice(lastDot).toLowerCase());
38
+ }
39
+ catch {
40
+ return false;
41
+ }
42
+ }
43
+ function extractLocUrls(xmlBody) {
44
+ const urls = [];
45
+ const re = /<loc>([^<]+)<\/loc>/g;
46
+ for (let m = re.exec(xmlBody); m !== null; m = re.exec(xmlBody)) {
47
+ urls.push(m[1]?.trim() ?? '');
48
+ }
49
+ return urls.filter(Boolean);
50
+ }
51
+ /**
52
+ * Collect all page URLs from sitemap.xml and sitemap-unindexed.xml (raw <loc> values).
53
+ * Uses resolved page URL lists when present (e.g. after following sitemap index links).
54
+ * Used for rewrite checks so every sitemap URL is verified to not rewrite to a different path.
55
+ */
56
+ export function pageUrlsFromSitemaps(validationResult) {
57
+ const pageUrls = new Set();
58
+ const sitemapUrls = validationResult.sitemapPageUrls ??
59
+ (validationResult.sitemapXml.body ? extractLocUrls(validationResult.sitemapXml.body) : []);
60
+ const unindexedUrls = validationResult.sitemapUnindexedPageUrls ??
61
+ (validationResult.sitemapUnindexedXml.body
62
+ ? extractLocUrls(validationResult.sitemapUnindexedXml.body)
63
+ : []);
64
+ for (const url of sitemapUrls)
65
+ pageUrls.add(url);
66
+ for (const url of unindexedUrls)
67
+ pageUrls.add(url);
68
+ return [...pageUrls];
69
+ }
70
+ /**
71
+ * Convert a page URL (e.g. https://example.com/about/) to markdown URL (https://example.com/about.md).
72
+ */
73
+ function pageUrlToMarkdownUrl(pageUrl, baseUrl) {
74
+ const base = baseUrl.replace(/\/$/, '');
75
+ try {
76
+ const u = new URL(pageUrl);
77
+ if (u.origin !== new URL(base).origin)
78
+ return '';
79
+ const path = u.pathname.replace(/\/$/, '') || '/';
80
+ if (path === '/')
81
+ return `${base}/index.md`;
82
+ return `${base}${path}.md`;
83
+ }
84
+ catch {
85
+ return '';
86
+ }
87
+ }
88
+ /**
89
+ * Build markdown URL list from a list of page URLs (e.g. resolved from sitemaps).
90
+ * Excludes non-page assets and converts each page URL to its .md equivalent.
91
+ */
92
+ function markdownUrlsFromPageUrls(baseUrl, pageUrls) {
93
+ const base = baseUrl.replace(/\/$/, '');
94
+ const markdownUrls = [];
95
+ for (const pageUrl of pageUrls) {
96
+ if (isNonPageAssetUrl(pageUrl))
97
+ continue;
98
+ const mdUrl = pageUrlToMarkdownUrl(pageUrl, base);
99
+ if (mdUrl)
100
+ markdownUrls.push(mdUrl);
101
+ }
102
+ return markdownUrls;
103
+ }
104
+ /**
105
+ * Collect all markdown URLs from sitemap.xml and sitemap-unindexed.xml (Option B).
106
+ * Uses resolved page URL lists when present (e.g. after following sitemap index links).
107
+ * Ensures every sitemap page has a corresponding markdown URL to validate.
108
+ */
109
+ export function collectMarkdownUrlsFromSitemaps(baseUrl, validationResult) {
110
+ const pageUrls = pageUrlsFromSitemaps(validationResult);
111
+ return markdownUrlsFromPageUrls(baseUrl, pageUrls);
112
+ }
113
+ /**
114
+ * Convert a production page URL to the same path on a development base URL.
115
+ * Keeps pathname (and search/hash); only the origin is replaced.
116
+ */
117
+ function toDevPageUrl(prodPageUrl, devBaseUrl) {
118
+ const devBase = devBaseUrl.replace(/\/$/, '');
119
+ try {
120
+ const u = new URL(prodPageUrl);
121
+ const devOrigin = new URL(devBase).origin;
122
+ return `${devOrigin}${u.pathname}${u.search}${u.hash}`;
123
+ }
124
+ catch {
125
+ return '';
126
+ }
127
+ }
128
+ /**
129
+ * Map production page URLs to development page URLs (same path, dev origin).
130
+ * Excludes non-page assets (download paths, static file extensions).
131
+ * Used in compare mode: sitemap from prod, check existence on dev.
132
+ */
133
+ export function mapProdPageUrlsToDev(prodPageUrls, devBaseUrl) {
134
+ const devUrls = [];
135
+ for (const prodUrl of prodPageUrls) {
136
+ if (isNonPageAssetUrl(prodUrl))
137
+ continue;
138
+ const devUrl = toDevPageUrl(prodUrl, devBaseUrl);
139
+ if (devUrl)
140
+ devUrls.push(devUrl);
141
+ }
142
+ return devUrls;
143
+ }
144
+ //# sourceMappingURL=collect-markdown-urls.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"collect-markdown-urls.js","sourceRoot":"","sources":["../src/collect-markdown-urls.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAIH,+FAA+F;AAC/F,MAAM,iBAAiB,GAAG,IAAI,GAAG,CAAC;IAChC,MAAM;IACN,MAAM;IACN,MAAM;IACN,OAAO;IACP,MAAM;IACN,OAAO;IACP,MAAM;IACN,OAAO;IACP,MAAM;IACN,MAAM;IACN,MAAM;IACN,MAAM;IACN,MAAM;IACN,MAAM;IACN,OAAO;IACP,OAAO;CACR,CAAC,CAAC;AAEH;;GAEG;AACH,SAAS,iBAAiB,CAAC,GAAW;IACpC,IAAI,CAAC;QACH,MAAM,CAAC,GAAG,IAAI,GAAG,CAAC,GAAG,CAAC,CAAC;QACvB,MAAM,IAAI,GAAG,CAAC,CAAC,QAAQ,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,IAAI,GAAG,CAAC;QAClD,IAAI,IAAI,CAAC,UAAU,CAAC,YAAY,CAAC;YAAE,OAAO,IAAI,CAAC;QAC/C,MAAM,OAAO,GAAG,IAAI,CAAC,WAAW,CAAC,GAAG,CAAC,CAAC;QACtC,IAAI,OAAO,KAAK,CAAC,CAAC;YAAE,OAAO,KAAK,CAAC;QACjC,OAAO,iBAAiB,CAAC,GAAG,CAAC,IAAI,CAAC,KAAK,CAAC,OAAO,CAAC,CAAC,WAAW,EAAE,CAAC,CAAC;IAClE,CAAC;IAAC,MAAM,CAAC;QACP,OAAO,KAAK,CAAC;IACf,CAAC;AACH,CAAC;AAED,SAAS,cAAc,CAAC,OAAe;IACrC,MAAM,IAAI,GAAa,EAAE,CAAC;IAC1B,MAAM,EAAE,GAAG,sBAAsB,CAAC;IAClC,KAAK,IAAI,CAAC,GAAG,EAAE,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,KAAK,IAAI,EAAE,CAAC,GAAG,EAAE,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC;QAChE,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,CAAC,EAAE,IAAI,EAAE,IAAI,EAAE,CAAC,CAAC;IAChC,CAAC;IACD,OAAO,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC;AAC9B,CAAC;AAED;;;;GAIG;AACH,MAAM,UAAU,oBAAoB,CAAC,gBAAkC;IACrE,MAAM,QAAQ,GAAG,IAAI,GAAG,EAAU,CAAC;IACnC,MAAM,WAAW,GACf,gBAAgB,CAAC,eAAe;QAChC,CAAC,gBAAgB,CAAC,UAAU,CAAC,IAAI,CAAC,CAAC,CAAC,cAAc,CAAC,gBAAgB,CAAC,UAAU,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC,CAAC;IAC7F,MAAM,aAAa,GACjB,gBAAgB,CAAC,wBAAwB;QACzC,CAAC,gBAAgB,CAAC,mBAAmB,CAAC,IAAI;YACxC,CAAC,CAAC,cAAc,CAAC,gBAAgB,CAAC,mBAAmB,CAAC,IAAI,CAAC;YAC3D,CAAC,CAAC,EAAE,CAAC,CAAC;IACV,KAAK,MAAM,GAAG,IAAI,WAAW;QAAE,QAAQ,CAAC,GAAG,CAAC,GAAG,CAAC,CAAC;IACjD,KAAK,MAAM,GAAG,IAAI,aAAa;QAAE,QAAQ,CAAC,GAAG,CAAC,GAAG,CAAC,CAAC;IACnD,OAAO,CAAC,GAAG,QAAQ,CAAC,CAAC;AACvB,CAAC;AAED;;GAEG;AACH,SAAS,oBAAoB,CAAC,OAAe,EAAE,OAAe;IAC5D,MAAM,IAAI,GAAG,OAAO,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;IACxC,IAAI,CAAC;QACH,MAAM,CAAC,GAAG,IAAI,GAAG,CAAC,OAAO,CAAC,CAAC;QAC3B,IAAI,CAAC,CAAC,MAAM,KAAK,IAAI,GAAG,CAAC,IAAI,CAAC,CAAC,MAAM;YAAE,OAAO,EAAE,CAAC;QACjD,MAAM,IAAI,GAAG,CAAC,CAAC,QAAQ,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,IAAI,GAAG,CAAC;QAClD,IAAI,IAAI,KAAK,GAAG;YAAE,OAAO,GAAG,IAAI,WAAW,CAAC;QAC5C,OAAO,GAAG,IAAI,GAAG,IAAI,KAAK,CAAC;IAC7B,CAAC;IAAC,MAAM,CAAC;QACP,OAAO,EAAE,CAAC;IACZ,CAAC;AACH,CAAC;AAED;;;GAGG;AACH,SAAS,wBAAwB,CAAC,OAAe,EAAE,QAAkB;IACnE,MAAM,IAAI,GAAG,OAAO,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;IACxC,MAAM,YAAY,GAAa,EAAE,CAAC;IAClC,KAAK,MAAM,OAAO,IAAI,QAAQ,EAAE,CAAC;QAC/B,IAAI,iBAAiB,CAAC,OAAO,CAAC;YAAE,SAAS;QACzC,MAAM,KAAK,GAAG,oBAAoB,CAAC,OAAO,EAAE,IAAI,CAAC,CAAC;QAClD,IAAI,KAAK;YAAE,YAAY,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;IACtC,CAAC;IACD,OAAO,YAAY,CAAC;AACtB,CAAC;AAED;;;;GAIG;AACH,MAAM,UAAU,+BAA+B,CAC7C,OAAe,EACf,gBAAkC;IAElC,MAAM,QAAQ,GAAG,oBAAoB,CAAC,gBAAgB,CAAC,CAAC;IACxD,OAAO,wBAAwB,CAAC,OAAO,EAAE,QAAQ,CAAC,CAAC;AACrD,CAAC;AAED;;;GAGG;AACH,SAAS,YAAY,CAAC,WAAmB,EAAE,UAAkB;IAC3D,MAAM,OAAO,GAAG,UAAU,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;IAC9C,IAAI,CAAC;QACH,MAAM,CAAC,GAAG,IAAI,GAAG,CAAC,WAAW,CAAC,CAAC;QAC/B,MAAM,SAAS,GAAG,IAAI,GAAG,CAAC,OAAO,CAAC,CAAC,MAAM,CAAC;QAC1C,OAAO,GAAG,SAAS,GAAG,CAAC,CAAC,QAAQ,GAAG,CAAC,CAAC,MAAM,GAAG,CAAC,CAAC,IAAI,EAAE,CAAC;IACzD,CAAC;IAAC,MAAM,CAAC;QACP,OAAO,EAAE,CAAC;IACZ,CAAC;AACH,CAAC;AAED;;;;GAIG;AACH,MAAM,UAAU,oBAAoB,CAAC,YAAsB,EAAE,UAAkB;IAC7E,MAAM,OAAO,GAAa,EAAE,CAAC;IAC7B,KAAK,MAAM,OAAO,IAAI,YAAY,EAAE,CAAC;QACnC,IAAI,iBAAiB,CAAC,OAAO,CAAC;YAAE,SAAS;QACzC,MAAM,MAAM,GAAG,YAAY,CAAC,OAAO,EAAE,UAAU,CAAC,CAAC;QACjD,IAAI,MAAM;YAAE,OAAO,CAAC,IAAI,CAAC,MAAM,CAAC,CAAC;IACnC,CAAC;IACD,OAAO,OAAO,CAAC;AACjB,CAAC"}
@@ -0,0 +1,19 @@
1
+ /**
2
+ * Download markdown files from URLs and save to output directory preserving path structure.
3
+ */
4
+ interface DownloadResult {
5
+ saved: number;
6
+ skipped: number;
7
+ errors: Array<{
8
+ url: string;
9
+ status?: number;
10
+ message: string;
11
+ }>;
12
+ }
13
+ /**
14
+ * Fetch each markdown URL and write body to outDir/{relativePath}. Create parent dirs as needed.
15
+ * Non-2xx: log and skip (do not write). Return counts and errors.
16
+ */
17
+ export declare function downloadMarkdownFiles(markdownUrls: string[], outDir: string, fetchInit: RequestInit): Promise<DownloadResult>;
18
+ export {};
19
+ //# sourceMappingURL=download.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"download.d.ts","sourceRoot":"","sources":["../src/download.ts"],"names":[],"mappings":"AAAA;;GAEG;AAmBH,UAAU,cAAc;IACtB,KAAK,EAAE,MAAM,CAAC;IACd,OAAO,EAAE,MAAM,CAAC;IAChB,MAAM,EAAE,KAAK,CAAC;QAAE,GAAG,EAAE,MAAM,CAAC;QAAC,MAAM,CAAC,EAAE,MAAM,CAAC;QAAC,OAAO,EAAE,MAAM,CAAA;KAAE,CAAC,CAAC;CAClE;AAED;;;GAGG;AACH,wBAAsB,qBAAqB,CACzC,YAAY,EAAE,MAAM,EAAE,EACtB,MAAM,EAAE,MAAM,EACd,SAAS,EAAE,WAAW,GACrB,OAAO,CAAC,cAAc,CAAC,CAwBzB"}
@@ -0,0 +1,48 @@
1
+ /**
2
+ * Download markdown files from URLs and save to output directory preserving path structure.
3
+ */
4
+ import * as fs from 'node:fs';
5
+ import * as path from 'node:path';
6
+ /**
7
+ * Derive relative file path from markdown URL (e.g. .../blog/foo.md -> blog/foo.md, .../index.md -> index.md).
8
+ */
9
+ function urlToRelativePath(markdownUrl) {
10
+ try {
11
+ const u = new URL(markdownUrl);
12
+ let p = u.pathname.replace(/^\//, '') || 'index.md';
13
+ if (p.endsWith('/'))
14
+ p = `${p}index.md`;
15
+ return p || 'index.md';
16
+ }
17
+ catch {
18
+ return 'index.md';
19
+ }
20
+ }
21
+ /**
22
+ * Fetch each markdown URL and write body to outDir/{relativePath}. Create parent dirs as needed.
23
+ * Non-2xx: log and skip (do not write). Return counts and errors.
24
+ */
25
+ export async function downloadMarkdownFiles(markdownUrls, outDir, fetchInit) {
26
+ const result = { saved: 0, skipped: 0, errors: [] };
27
+ for (const url of markdownUrls) {
28
+ const res = await fetch(url, fetchInit);
29
+ if (!res.ok) {
30
+ result.skipped++;
31
+ result.errors.push({
32
+ url,
33
+ status: res.status,
34
+ message: `HTTP ${res.status}`,
35
+ });
36
+ continue;
37
+ }
38
+ const body = await res.text();
39
+ const relativePath = urlToRelativePath(url);
40
+ const filePath = path.join(outDir, relativePath);
41
+ const dir = path.dirname(filePath);
42
+ fs.mkdirSync(dir, { recursive: true });
43
+ fs.writeFileSync(filePath, body, 'utf8');
44
+ result.saved++;
45
+ }
46
+ return result;
47
+ }
48
+ //# sourceMappingURL=download.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"download.js","sourceRoot":"","sources":["../src/download.ts"],"names":[],"mappings":"AAAA;;GAEG;AAEH,OAAO,KAAK,EAAE,MAAM,SAAS,CAAC;AAC9B,OAAO,KAAK,IAAI,MAAM,WAAW,CAAC;AAElC;;GAEG;AACH,SAAS,iBAAiB,CAAC,WAAmB;IAC5C,IAAI,CAAC;QACH,MAAM,CAAC,GAAG,IAAI,GAAG,CAAC,WAAW,CAAC,CAAC;QAC/B,IAAI,CAAC,GAAG,CAAC,CAAC,QAAQ,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,IAAI,UAAU,CAAC;QACpD,IAAI,CAAC,CAAC,QAAQ,CAAC,GAAG,CAAC;YAAE,CAAC,GAAG,GAAG,CAAC,UAAU,CAAC;QACxC,OAAO,CAAC,IAAI,UAAU,CAAC;IACzB,CAAC;IAAC,MAAM,CAAC;QACP,OAAO,UAAU,CAAC;IACpB,CAAC;AACH,CAAC;AAQD;;;GAGG;AACH,MAAM,CAAC,KAAK,UAAU,qBAAqB,CACzC,YAAsB,EACtB,MAAc,EACd,SAAsB;IAEtB,MAAM,MAAM,GAAmB,EAAE,KAAK,EAAE,CAAC,EAAE,OAAO,EAAE,CAAC,EAAE,MAAM,EAAE,EAAE,EAAE,CAAC;IAEpE,KAAK,MAAM,GAAG,IAAI,YAAY,EAAE,CAAC;QAC/B,MAAM,GAAG,GAAG,MAAM,KAAK,CAAC,GAAG,EAAE,SAAS,CAAC,CAAC;QACxC,IAAI,CAAC,GAAG,CAAC,EAAE,EAAE,CAAC;YACZ,MAAM,CAAC,OAAO,EAAE,CAAC;YACjB,MAAM,CAAC,MAAM,CAAC,IAAI,CAAC;gBACjB,GAAG;gBACH,MAAM,EAAE,GAAG,CAAC,MAAM;gBAClB,OAAO,EAAE,QAAQ,GAAG,CAAC,MAAM,EAAE;aAC9B,CAAC,CAAC;YACH,SAAS;QACX,CAAC;QACD,MAAM,IAAI,GAAG,MAAM,GAAG,CAAC,IAAI,EAAE,CAAC;QAC9B,MAAM,YAAY,GAAG,iBAAiB,CAAC,GAAG,CAAC,CAAC;QAC5C,MAAM,QAAQ,GAAG,IAAI,CAAC,IAAI,CAAC,MAAM,EAAE,YAAY,CAAC,CAAC;QACjD,MAAM,GAAG,GAAG,IAAI,CAAC,OAAO,CAAC,QAAQ,CAAC,CAAC;QACnC,EAAE,CAAC,SAAS,CAAC,GAAG,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC,CAAC;QACvC,EAAE,CAAC,aAAa,CAAC,QAAQ,EAAE,IAAI,EAAE,MAAM,CAAC,CAAC;QACzC,MAAM,CAAC,KAAK,EAAE,CAAC;IACjB,CAAC;IAED,OAAO,MAAM,CAAC;AAChB,CAAC"}
@@ -0,0 +1,19 @@
1
+ /**
2
+ * Build shared RequestInit (headers) for all fetch calls.
3
+ * Supports generic -H / --header and --vercel-bypass for Vercel Deployment Protection.
4
+ */
5
+ interface FetchOptionsInput {
6
+ /** Headers from -H / --header (e.g. ["Name: value"]) */
7
+ headerArgs?: string[];
8
+ /** Secret for --vercel-bypass; if flag set but empty, use env */
9
+ vercelBypassSecret?: string | null;
10
+ /** Whether --vercel-bypass was passed (so we use env if secret not provided) */
11
+ vercelBypassRequested?: boolean;
12
+ }
13
+ /**
14
+ * Build RequestInit with headers from CLI args.
15
+ * Vercel bypass is applied first; user -H headers override.
16
+ */
17
+ export declare function buildFetchOptions(input: FetchOptionsInput): RequestInit;
18
+ export {};
19
+ //# sourceMappingURL=fetch-options.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"fetch-options.d.ts","sourceRoot":"","sources":["../src/fetch-options.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAKH,UAAU,iBAAiB;IACzB,wDAAwD;IACxD,UAAU,CAAC,EAAE,MAAM,EAAE,CAAC;IACtB,iEAAiE;IACjE,kBAAkB,CAAC,EAAE,MAAM,GAAG,IAAI,CAAC;IACnC,gFAAgF;IAChF,qBAAqB,CAAC,EAAE,OAAO,CAAC;CACjC;AAqBD;;;GAGG;AACH,wBAAgB,iBAAiB,CAAC,KAAK,EAAE,iBAAiB,GAAG,WAAW,CAuBvE"}
@@ -0,0 +1,49 @@
1
+ /**
2
+ * Build shared RequestInit (headers) for all fetch calls.
3
+ * Supports generic -H / --header and --vercel-bypass for Vercel Deployment Protection.
4
+ */
5
+ const VERCEL_BYPASS_HEADER = 'x-vercel-protection-bypass';
6
+ const VERCEL_BYPASS_ENV = 'VERCEL_AUTOMATION_BYPASS_SECRET';
7
+ /**
8
+ * Parse a "Name: value" header string. Handles optional quotes around value.
9
+ */
10
+ function parseHeaderArg(arg) {
11
+ const trimmed = arg.trim();
12
+ const colonIdx = trimmed.indexOf(':');
13
+ if (colonIdx <= 0)
14
+ return null;
15
+ const name = trimmed.slice(0, colonIdx).trim();
16
+ let value = trimmed.slice(colonIdx + 1).trim();
17
+ if ((value.startsWith('"') && value.endsWith('"')) ||
18
+ (value.startsWith("'") && value.endsWith("'"))) {
19
+ value = value.slice(1, -1);
20
+ }
21
+ if (!name)
22
+ return null;
23
+ return { name, value };
24
+ }
25
+ /**
26
+ * Build RequestInit with headers from CLI args.
27
+ * Vercel bypass is applied first; user -H headers override.
28
+ */
29
+ export function buildFetchOptions(input) {
30
+ const headers = new Headers();
31
+ // 1. Vercel bypass (if requested)
32
+ const bypassSecret = input.vercelBypassSecret !== undefined && input.vercelBypassSecret !== null
33
+ ? input.vercelBypassSecret
34
+ : input.vercelBypassRequested
35
+ ? (process.env[VERCEL_BYPASS_ENV] ?? '')
36
+ : '';
37
+ if (bypassSecret) {
38
+ headers.set(VERCEL_BYPASS_HEADER, bypassSecret);
39
+ }
40
+ // 2. User -H headers (override bypass if same name)
41
+ for (const arg of input.headerArgs ?? []) {
42
+ const parsed = parseHeaderArg(arg);
43
+ if (parsed) {
44
+ headers.set(parsed.name, parsed.value);
45
+ }
46
+ }
47
+ return { headers };
48
+ }
49
+ //# sourceMappingURL=fetch-options.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"fetch-options.js","sourceRoot":"","sources":["../src/fetch-options.ts"],"names":[],"mappings":"AAAA;;;GAGG;AAEH,MAAM,oBAAoB,GAAG,4BAA4B,CAAC;AAC1D,MAAM,iBAAiB,GAAG,iCAAiC,CAAC;AAW5D;;GAEG;AACH,SAAS,cAAc,CAAC,GAAW;IACjC,MAAM,OAAO,GAAG,GAAG,CAAC,IAAI,EAAE,CAAC;IAC3B,MAAM,QAAQ,GAAG,OAAO,CAAC,OAAO,CAAC,GAAG,CAAC,CAAC;IACtC,IAAI,QAAQ,IAAI,CAAC;QAAE,OAAO,IAAI,CAAC;IAC/B,MAAM,IAAI,GAAG,OAAO,CAAC,KAAK,CAAC,CAAC,EAAE,QAAQ,CAAC,CAAC,IAAI,EAAE,CAAC;IAC/C,IAAI,KAAK,GAAG,OAAO,CAAC,KAAK,CAAC,QAAQ,GAAG,CAAC,CAAC,CAAC,IAAI,EAAE,CAAC;IAC/C,IACE,CAAC,KAAK,CAAC,UAAU,CAAC,GAAG,CAAC,IAAI,KAAK,CAAC,QAAQ,CAAC,GAAG,CAAC,CAAC;QAC9C,CAAC,KAAK,CAAC,UAAU,CAAC,GAAG,CAAC,IAAI,KAAK,CAAC,QAAQ,CAAC,GAAG,CAAC,CAAC,EAC9C,CAAC;QACD,KAAK,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,CAAC,CAAC,CAAC;IAC7B,CAAC;IACD,IAAI,CAAC,IAAI;QAAE,OAAO,IAAI,CAAC;IACvB,OAAO,EAAE,IAAI,EAAE,KAAK,EAAE,CAAC;AACzB,CAAC;AAED;;;GAGG;AACH,MAAM,UAAU,iBAAiB,CAAC,KAAwB;IACxD,MAAM,OAAO,GAAG,IAAI,OAAO,EAAE,CAAC;IAE9B,kCAAkC;IAClC,MAAM,YAAY,GAChB,KAAK,CAAC,kBAAkB,KAAK,SAAS,IAAI,KAAK,CAAC,kBAAkB,KAAK,IAAI;QACzE,CAAC,CAAC,KAAK,CAAC,kBAAkB;QAC1B,CAAC,CAAC,KAAK,CAAC,qBAAqB;YAC3B,CAAC,CAAC,CAAC,OAAO,CAAC,GAAG,CAAC,iBAAiB,CAAC,IAAI,EAAE,CAAC;YACxC,CAAC,CAAC,EAAE,CAAC;IACX,IAAI,YAAY,EAAE,CAAC;QACjB,OAAO,CAAC,GAAG,CAAC,oBAAoB,EAAE,YAAY,CAAC,CAAC;IAClD,CAAC;IAED,oDAAoD;IACpD,KAAK,MAAM,GAAG,IAAI,KAAK,CAAC,UAAU,IAAI,EAAE,EAAE,CAAC;QACzC,MAAM,MAAM,GAAG,cAAc,CAAC,GAAG,CAAC,CAAC;QACnC,IAAI,MAAM,EAAE,CAAC;YACX,OAAO,CAAC,GAAG,CAAC,MAAM,CAAC,IAAI,EAAE,MAAM,CAAC,KAAK,CAAC,CAAC;QACzC,CAAC;IACH,CAAC;IAED,OAAO,EAAE,OAAO,EAAE,CAAC;AACrB,CAAC"}
@@ -0,0 +1,32 @@
1
+ /**
2
+ * Fetch and validate sitemap.xml, sitemap-unindexed.xml, and llms.txt.
3
+ * When sitemap.xml or sitemap-unindexed.xml is a sitemap index, follows child
4
+ * sitemaps and collects page URLs from their urlset content.
5
+ */
6
+ export interface ValidationResult {
7
+ sitemapXml: {
8
+ ok: boolean;
9
+ status: number;
10
+ body?: string;
11
+ };
12
+ sitemapUnindexedXml: {
13
+ ok: boolean;
14
+ status: number;
15
+ body?: string;
16
+ warned?: boolean;
17
+ };
18
+ llmsTxt: {
19
+ ok: boolean;
20
+ status: number;
21
+ body?: string;
22
+ };
23
+ /** Page URLs from sitemap.xml (resolved from child sitemaps when root is an index). */
24
+ sitemapPageUrls?: string[];
25
+ /** Page URLs from sitemap-unindexed.xml (resolved from child sitemaps when root is an index). */
26
+ sitemapUnindexedPageUrls?: string[];
27
+ /** True when sitemap.xml was a sitemap index (child sitemaps were followed). */
28
+ sitemapWasIndex?: boolean;
29
+ }
30
+ export declare function validate(baseUrl: string, fetchInit: RequestInit): Promise<ValidationResult>;
31
+ export declare function validationFailed(result: ValidationResult): boolean;
32
+ //# sourceMappingURL=validate.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"validate.d.ts","sourceRoot":"","sources":["../src/validate.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,MAAM,WAAW,gBAAgB;IAC/B,UAAU,EAAE;QAAE,EAAE,EAAE,OAAO,CAAC;QAAC,MAAM,EAAE,MAAM,CAAC;QAAC,IAAI,CAAC,EAAE,MAAM,CAAA;KAAE,CAAC;IAC3D,mBAAmB,EAAE;QAAE,EAAE,EAAE,OAAO,CAAC;QAAC,MAAM,EAAE,MAAM,CAAC;QAAC,IAAI,CAAC,EAAE,MAAM,CAAC;QAAC,MAAM,CAAC,EAAE,OAAO,CAAA;KAAE,CAAC;IACtF,OAAO,EAAE;QAAE,EAAE,EAAE,OAAO,CAAC;QAAC,MAAM,EAAE,MAAM,CAAC;QAAC,IAAI,CAAC,EAAE,MAAM,CAAA;KAAE,CAAC;IACxD,uFAAuF;IACvF,eAAe,CAAC,EAAE,MAAM,EAAE,CAAC;IAC3B,iGAAiG;IACjG,wBAAwB,CAAC,EAAE,MAAM,EAAE,CAAC;IACpC,gFAAgF;IAChF,eAAe,CAAC,EAAE,OAAO,CAAC;CAC3B;AAwCD,wBAAsB,QAAQ,CAAC,OAAO,EAAE,MAAM,EAAE,SAAS,EAAE,WAAW,GAAG,OAAO,CAAC,gBAAgB,CAAC,CAuCjG;AAED,wBAAgB,gBAAgB,CAAC,MAAM,EAAE,gBAAgB,GAAG,OAAO,CAElE"}
@@ -0,0 +1,79 @@
1
+ /**
2
+ * Fetch and validate sitemap.xml, sitemap-unindexed.xml, and llms.txt.
3
+ * When sitemap.xml or sitemap-unindexed.xml is a sitemap index, follows child
4
+ * sitemaps and collects page URLs from their urlset content.
5
+ */
6
+ function hasLocTag(body) {
7
+ return /<loc>[^<]+<\/loc>/.test(body);
8
+ }
9
+ function extractLocUrls(xmlBody) {
10
+ const urls = [];
11
+ const re = /<loc>([^<]+)<\/loc>/g;
12
+ for (let m = re.exec(xmlBody); m !== null; m = re.exec(xmlBody)) {
13
+ const u = m[1]?.trim();
14
+ if (u)
15
+ urls.push(u);
16
+ }
17
+ return urls;
18
+ }
19
+ function isSitemapIndex(body) {
20
+ return /<sitemapindex[\s>]/.test(body) || body.includes('<sitemap>');
21
+ }
22
+ async function resolveSitemapToPageUrls(body, fetchInit) {
23
+ if (!isSitemapIndex(body)) {
24
+ const pageUrls = extractLocUrls(body);
25
+ return { ok: pageUrls.length > 0 || hasLocTag(body), pageUrls };
26
+ }
27
+ const childUrls = extractLocUrls(body);
28
+ const allPageUrls = [];
29
+ for (const childUrl of childUrls) {
30
+ const res = await fetch(childUrl, fetchInit);
31
+ if (!res.ok)
32
+ return { ok: false, pageUrls: [] };
33
+ const childBody = await res.text();
34
+ // Empty child sitemaps (0 <loc> entries) are valid; they contribute no URLs.
35
+ allPageUrls.push(...extractLocUrls(childBody));
36
+ }
37
+ return { ok: allPageUrls.length > 0, pageUrls: allPageUrls };
38
+ }
39
+ export async function validate(baseUrl, fetchInit) {
40
+ const base = baseUrl.replace(/\/$/, '');
41
+ const result = {
42
+ sitemapXml: { ok: false, status: 0 },
43
+ sitemapUnindexedXml: { ok: false, status: 0 },
44
+ llmsTxt: { ok: false, status: 0 },
45
+ };
46
+ const sitemapRes = await fetch(`${base}/sitemap.xml`, fetchInit);
47
+ result.sitemapXml.status = sitemapRes.status;
48
+ if (sitemapRes.ok) {
49
+ const body = await sitemapRes.text();
50
+ result.sitemapXml.body = body;
51
+ const resolved = await resolveSitemapToPageUrls(body, fetchInit);
52
+ result.sitemapXml.ok = resolved.ok;
53
+ result.sitemapPageUrls = resolved.pageUrls;
54
+ result.sitemapWasIndex = isSitemapIndex(body);
55
+ }
56
+ const unindexedRes = await fetch(`${base}/sitemap-unindexed.xml`, fetchInit);
57
+ result.sitemapUnindexedXml.status = unindexedRes.status;
58
+ if (unindexedRes.status === 404) {
59
+ result.sitemapUnindexedXml.warned = true;
60
+ }
61
+ else if (unindexedRes.ok) {
62
+ const body = await unindexedRes.text();
63
+ result.sitemapUnindexedXml.body = body;
64
+ const resolved = await resolveSitemapToPageUrls(body, fetchInit);
65
+ result.sitemapUnindexedXml.ok = resolved.ok;
66
+ result.sitemapUnindexedPageUrls = resolved.pageUrls;
67
+ }
68
+ const llmsRes = await fetch(`${base}/llms.txt`, fetchInit);
69
+ result.llmsTxt.status = llmsRes.status;
70
+ if (llmsRes.ok) {
71
+ result.llmsTxt.body = await llmsRes.text();
72
+ result.llmsTxt.ok = true;
73
+ }
74
+ return result;
75
+ }
76
+ export function validationFailed(result) {
77
+ return !result.sitemapXml.ok || !result.llmsTxt.ok;
78
+ }
79
+ //# sourceMappingURL=validate.js.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"validate.js","sourceRoot":"","sources":["../src/validate.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAcH,SAAS,SAAS,CAAC,IAAY;IAC7B,OAAO,mBAAmB,CAAC,IAAI,CAAC,IAAI,CAAC,CAAC;AACxC,CAAC;AAED,SAAS,cAAc,CAAC,OAAe;IACrC,MAAM,IAAI,GAAa,EAAE,CAAC;IAC1B,MAAM,EAAE,GAAG,sBAAsB,CAAC;IAClC,KAAK,IAAI,CAAC,GAAG,EAAE,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC,KAAK,IAAI,EAAE,CAAC,GAAG,EAAE,CAAC,IAAI,CAAC,OAAO,CAAC,EAAE,CAAC;QAChE,MAAM,CAAC,GAAG,CAAC,CAAC,CAAC,CAAC,EAAE,IAAI,EAAE,CAAC;QACvB,IAAI,CAAC;YAAE,IAAI,CAAC,IAAI,CAAC,CAAC,CAAC,CAAC;IACtB,CAAC;IACD,OAAO,IAAI,CAAC;AACd,CAAC;AAED,SAAS,cAAc,CAAC,IAAY;IAClC,OAAO,oBAAoB,CAAC,IAAI,CAAC,IAAI,CAAC,IAAI,IAAI,CAAC,QAAQ,CAAC,WAAW,CAAC,CAAC;AACvE,CAAC;AAED,KAAK,UAAU,wBAAwB,CACrC,IAAY,EACZ,SAAsB;IAEtB,IAAI,CAAC,cAAc,CAAC,IAAI,CAAC,EAAE,CAAC;QAC1B,MAAM,QAAQ,GAAG,cAAc,CAAC,IAAI,CAAC,CAAC;QACtC,OAAO,EAAE,EAAE,EAAE,QAAQ,CAAC,MAAM,GAAG,CAAC,IAAI,SAAS,CAAC,IAAI,CAAC,EAAE,QAAQ,EAAE,CAAC;IAClE,CAAC;IACD,MAAM,SAAS,GAAG,cAAc,CAAC,IAAI,CAAC,CAAC;IACvC,MAAM,WAAW,GAAa,EAAE,CAAC;IACjC,KAAK,MAAM,QAAQ,IAAI,SAAS,EAAE,CAAC;QACjC,MAAM,GAAG,GAAG,MAAM,KAAK,CAAC,QAAQ,EAAE,SAAS,CAAC,CAAC;QAC7C,IAAI,CAAC,GAAG,CAAC,EAAE;YAAE,OAAO,EAAE,EAAE,EAAE,KAAK,EAAE,QAAQ,EAAE,EAAE,EAAE,CAAC;QAChD,MAAM,SAAS,GAAG,MAAM,GAAG,CAAC,IAAI,EAAE,CAAC;QACnC,6EAA6E;QAC7E,WAAW,CAAC,IAAI,CAAC,GAAG,cAAc,CAAC,SAAS,CAAC,CAAC,CAAC;IACjD,CAAC;IACD,OAAO,EAAE,EAAE,EAAE,WAAW,CAAC,MAAM,GAAG,CAAC,EAAE,QAAQ,EAAE,WAAW,EAAE,CAAC;AAC/D,CAAC;AAED,MAAM,CAAC,KAAK,UAAU,QAAQ,CAAC,OAAe,EAAE,SAAsB;IACpE,MAAM,IAAI,GAAG,OAAO,CAAC,OAAO,CAAC,KAAK,EAAE,EAAE,CAAC,CAAC;IACxC,MAAM,MAAM,GAAqB;QAC/B,UAAU,EAAE,EAAE,EAAE,EAAE,KAAK,EAAE,MAAM,EAAE,CAAC,EAAE;QACpC,mBAAmB,EAAE,EAAE,EAAE,EAAE,KAAK,EAAE,MAAM,EAAE,CAAC,EAAE;QAC7C,OAAO,EAAE,EAAE,EAAE,EAAE,KAAK,EAAE,MAAM,EAAE,CAAC,EAAE;KAClC,CAAC;IAEF,MAAM,UAAU,GAAG,MAAM,KAAK,CAAC,GAAG,IAAI,cAAc,EAAE,SAAS,CAAC,CAAC;IACjE,MAAM,CAAC,UAAU,CAAC,MAAM,GAAG,UAAU,CAAC,MAAM,CAAC;IAC7C,IAAI,UAAU,CAAC,EAAE,EAAE,CAAC;QAClB,MAAM,IAAI,GAAG,MAAM,UAAU,CAAC,IAAI,EAAE,CAAC;QACrC,MAAM,CAAC,UAAU,CAAC,IAAI,GAAG,IAAI,CAAC;QAC9B,MAAM,QAAQ,GAAG,MAAM,wBAAwB,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;QACjE,MAAM,CAAC,UAAU,CAAC,EAAE,GAAG,QAAQ,CAAC,EAAE,CAAC;QACnC,MAAM,CAAC,eAAe,GAAG,QAAQ,CAAC,QAAQ,CAAC;QAC3C,MAAM,CAAC,eAAe,GAAG,cAAc,CAAC,IAAI,CAAC,CAAC;IAChD,CAAC;IAED,MAAM,YAAY,GAAG,MAAM,KAAK,CAAC,GAAG,IAAI,wBAAwB,EAAE,SAAS,CAAC,CAAC;IAC7E,MAAM,CAAC,mBAAmB,CAAC,MAAM,GAAG,YAAY,CAAC,MAAM,CAAC;IACxD,IAAI,YAAY,CAAC,MAAM,KAAK,GAAG,EAAE,CAAC;QAChC,MAAM,CAAC,mBAAmB,CAAC,MAAM,GAAG,IAAI,CAAC;IAC3C,CAAC;SAAM,IAAI,YAAY,CAAC,EAAE,EAAE,CAAC;QAC3B,MAAM,IAAI,GAAG,MAAM,YAAY,CAAC,IAAI,EAAE,CAAC;QACvC,MAAM,CAAC,mBAAmB,CAAC,IAAI,GAAG,IAAI,CAAC;QACvC,MAAM,QAAQ,GAAG,MAAM,wBAAwB,CAAC,IAAI,EAAE,SAAS,CAAC,CAAC;QACjE,MAAM,CAAC,mBAAmB,CAAC,EAAE,GAAG,QAAQ,CAAC,EAAE,CAAC;QAC5C,MAAM,CAAC,wBAAwB,GAAG,QAAQ,CAAC,QAAQ,CAAC;IACtD,CAAC;IAED,MAAM,OAAO,GAAG,MAAM,KAAK,CAAC,GAAG,IAAI,WAAW,EAAE,SAAS,CAAC,CAAC;IAC3D,MAAM,CAAC,OAAO,CAAC,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC;IACvC,IAAI,OAAO,CAAC,EAAE,EAAE,CAAC;QACf,MAAM,CAAC,OAAO,CAAC,IAAI,GAAG,MAAM,OAAO,CAAC,IAAI,EAAE,CAAC;QAC3C,MAAM,CAAC,OAAO,CAAC,EAAE,GAAG,IAAI,CAAC;IAC3B,CAAC;IAED,OAAO,MAAM,CAAC;AAChB,CAAC;AAED,MAAM,UAAU,gBAAgB,CAAC,MAAwB;IACvD,OAAO,CAAC,MAAM,CAAC,UAAU,CAAC,EAAE,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,EAAE,CAAC;AACrD,CAAC"}
package/package.json ADDED
@@ -0,0 +1,45 @@
1
+ {
2
+ "name": "@se-studio/site-check",
3
+ "version": "1.0.1",
4
+ "description": "Validate SE marketing sites (sitemap, llms.txt) and download markdown files preserving structure",
5
+ "repository": {
6
+ "type": "git",
7
+ "url": "https://github.com/Something-Else-Studio/se-core-product",
8
+ "directory": "packages/site-check"
9
+ },
10
+ "license": "MIT",
11
+ "type": "module",
12
+ "main": "./dist/cli.js",
13
+ "bin": {
14
+ "site-check": "./dist/cli.js"
15
+ },
16
+ "files": [
17
+ "dist",
18
+ "*.md"
19
+ ],
20
+ "keywords": [
21
+ "sitemap",
22
+ "llms.txt",
23
+ "markdown",
24
+ "validation",
25
+ "cli",
26
+ "vercel",
27
+ "deployment-protection"
28
+ ],
29
+ "homepage": "https://github.com/Something-Else-Studio/se-core-product#readme",
30
+ "bugs": {
31
+ "url": "https://github.com/Something-Else-Studio/se-core-product/issues"
32
+ },
33
+ "devDependencies": {
34
+ "@biomejs/biome": "^2.4.6",
35
+ "@types/node": "^22.19.15",
36
+ "typescript": "^5.9.3"
37
+ },
38
+ "scripts": {
39
+ "build": "tsc --project tsconfig.build.json",
40
+ "dev": "tsc --project tsconfig.build.json --watch",
41
+ "type-check": "tsc --noEmit",
42
+ "lint": "biome lint .",
43
+ "clean": "rm -rf dist .turbo *.tsbuildinfo"
44
+ }
45
+ }