@kirkelabs/agent-readiness-scan 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AUTHORS ADDED
@@ -0,0 +1,11 @@
1
+ agent-readiness-scan authors
2
+ ============================
3
+
4
+ Lead author:
5
+ Soleman El Gelawi <soleman@kirkelabs.com> — CTO, Kirke Labs
6
+
7
+ Co-author / publisher:
8
+ Steve Kirton <steve@kirkelabs.com> — Founder & CEO, Kirke Labs
9
+
10
+ Organization:
11
+ Kirke Labs — https://www.kirkelabs.com
package/CITATION.cff ADDED
@@ -0,0 +1,30 @@
1
+ cff-version: 1.2.0
2
+ message: "If you use this software, please cite it as below."
3
+ title: "agent-readiness-scan"
4
+ abstract: "Open-source customs-house auditor for AI agents: scores 8 weighted dimensions covering crawler policy, agent-action surfaces (MCP/ACP), Product/Offer completeness, and brand identity corroboration. Generates a drop-in customs declaration (robots.txt + .well-known/ manifests)."
5
+ type: software
6
+ license: MIT
7
+ repository-code: "https://github.com/KirkeLabs/agent-readiness-scan"
8
+ url: "https://kirkelabs.github.io/agent-readiness-scan/"
9
+ version: 0.1.0
10
+ date-released: 2026-06-01
11
+ authors:
12
+ - given-names: Soleman
13
+ family-names: El Gelawi
14
+ email: soleman@kirkelabs.com
15
+ affiliation: "Kirke Labs"
16
+ # sameAs: https://www.linkedin.com/in/soleman-gelawi/ , https://github.com/sgelawi
17
+ - given-names: Steve
18
+ family-names: Kirton
19
+ email: steve@kirkelabs.com
20
+ affiliation: "Kirke Labs"
21
+ # sameAs: https://www.linkedin.com/in/stevekirton-kirkelabs/
22
+ keywords:
23
+ - ai-agents
24
+ - mcp
25
+ - acp
26
+ - agentic-commerce
27
+ - crawler-policy
28
+ - web-bot-auth
29
+ - schema-org
30
+ - algorand
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Kirke Labs — Soleman El Gelawi and Steve Kirton
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,142 @@
1
+ # agent-readiness-scan
2
+
3
+ [![npm version](https://img.shields.io/npm/v/@kirkelabs/agent-readiness-scan?color=00dc94&style=flat)](https://www.npmjs.com/package/@kirkelabs/agent-readiness-scan)
4
+ [![License: MIT](https://img.shields.io/badge/license-MIT-00dc94?style=flat)](./LICENSE)
5
+ [![Node](https://img.shields.io/badge/node-%3E%3D20-00dc94?style=flat)](https://nodejs.org)
6
+ [![CI](https://img.shields.io/badge/CI-passing-00dc94?style=flat)](./.github/workflows/ci.yml)
7
+
8
+ **Is your brand ready for AI agents to act on it?** Audit your customs-house posture — crawler policy, MCP/ACP exposure, agent-actionable Product/Offer, brand identity corroboration — and get the drop-in files to fix it.
9
+
10
+ ```bash
11
+ npx @kirkelabs/agent-readiness-scan https://your-site.com
12
+ ```
13
+
14
+ No install. No account. No data leaves your machine.
15
+
16
+ > Built by Soleman El Gelawi (CTO, [Kirke Labs](https://www.kirkelabs.com)), with Steve Kirton — open-sourced as a gift to the Algorand ecosystem. MIT licensed. Use it, fork it, ship it.
17
+
18
+ ---
19
+
20
+ ## What is this?
21
+
22
+ The open web is becoming a customs house. AI search, agentic commerce (ACP, Universal Cart), bot authentication (Web Bot Auth), crawler policy (Cloudflare Content Signals), and the EU DSA / DMA all push in the same direction: every web property now needs a *declared access posture*, not just a content strategy.
23
+
24
+ `agent-readiness-scan` audits that posture. It fetches a URL plus seven `.well-known/*` paths plus `robots.txt`, and scores 8 dimensions covering:
25
+
26
+ - **Crawler policy** — does your `robots.txt` name the major AI bots individually, with declared use-policy signals?
27
+ - **Bot authentication** — is a Web Bot Auth key directory present?
28
+ - **Agent action surfaces** — MCP server card, Agentic Commerce Protocol manifest, Google Universal Cart manifest?
29
+ - **Commerce structured data** — are your Product/Offer JSON-LD blocks complete enough for agent-driven checkout?
30
+ - **Identity corroboration** — does the `sameAs` graph reach registry-grade sources (Wikidata, Crunchbase, Companies House, SEC EDGAR, GLEIF)?
31
+ - **Source operations & regulatory transparency** — dateModified, security.txt, T&Cs, contact, privacy.
32
+
33
+ Then it generates the files you need to fix the gaps — a drop-in `robots.txt`, `.well-known/security.txt`, MCP server card, and ACP manifest scaffolds.
34
+
35
+ Companion to [`@kirkelabs/ai-legibility-scan`](https://github.com/KirkeLabs/ai-legibility-scan): that one scores how *legible* your site is to an AI crawler. This one scores how *agent-ready* it is once the crawler can read it.
36
+
37
+ ## Why?
38
+
39
+ The strategic paper this tool is built on — [*The Web Becomes a Customs House*](https://www.kirkelabs.com/papers/customs-house) — argues that the new web bargain is declared-access-for-action. A page may be cited without being visited; a product may be transacted without a click. Existing "AI visibility" tools tell you you're invisible. This one is a free CLI that audits your *customs-house posture* and hands you the drop-in declarations to fix it.
40
+
41
+ ## Install
42
+
43
+ Nothing to install — use `npx`:
44
+
45
+ ```bash
46
+ npx @kirkelabs/agent-readiness-scan https://your-site.com
47
+ ```
48
+
49
+ Or add it to a project:
50
+
51
+ ```bash
52
+ npm i -D @kirkelabs/agent-readiness-scan
53
+ ```
54
+
55
+ Requires Node.js ≥ 20.
56
+
57
+ ## Quickstart
58
+
59
+ ```bash
60
+ # default scan
61
+ npx @kirkelabs/agent-readiness-scan https://your-site.com
62
+
63
+ # write artefacts to ./report
64
+ npx @kirkelabs/agent-readiness-scan https://your-site.com --out ./report
65
+
66
+ # machine-readable output for scripting
67
+ npx @kirkelabs/agent-readiness-scan https://your-site.com --json
68
+ ```
69
+
70
+ Files land in the output directory (default `./agent-readiness-out/`):
71
+
72
+ | File / Directory | What it is |
73
+ |---|---|
74
+ | `score.json` | Machine-readable result — gate your CI on it |
75
+ | `report.md` | Human-readable findings |
76
+ | `scorecard.html` | Self-contained shareable scorecard |
77
+ | `customs-declaration/robots.txt` | Drop-in robots.txt with per-AI-bot rules + Cloudflare Content Signals |
78
+ | `customs-declaration/.well-known/security.txt` | RFC 9116 scaffold |
79
+ | `customs-declaration/.well-known/mcp/server-card.json` | MCP server card scaffold |
80
+ | `customs-declaration/.well-known/acp/manifest.json` | Agentic Commerce Protocol manifest scaffold |
81
+
82
+ ## How it scores
83
+
84
+ Eight weighted dimensions, normalised to 0–100 and graded A–F:
85
+
86
+ | # | Dimension | Weight | What it checks |
87
+ |---|---|---|---|
88
+ | 1 | Per-bot crawler policy | 10 | robots.txt names individual AI bots (GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot, Google-Extended, anthropic-ai, Claude-Web, ChatGPT-User, Claude-User, CCBot, Bytespider, Amazonbot, Applebot-Extended, meta-externalagent) |
89
+ | 2 | Declared use-policy signals | 7 | Cloudflare Content Signals (search / ai-input / ai-train), `noai` / `noimageai` meta, `X-Robots-Tag` |
90
+ | 3 | Bot-Auth readiness | 5 | `/.well-known/http-message-signatures-directory` (Web Bot Auth, IETF draft) |
91
+ | 4 | MCP exposure | 7 | `/.well-known/mcp/server-card.json` + `/.well-known/oauth-protected-resource` with PKCE/S256 (NSA May-2026 guidance) |
92
+ | 5 | Agentic-commerce manifests | 7 | `/.well-known/acp/manifest.json` (OpenAI/Stripe) and/or `/.well-known/ucp` (Google Universal Cart) |
93
+ | 6 | Agent-actionable Product/Offer | 7 | Product/Offer JSON-LD completeness (price, availability, priceValidUntil-future, shippingDetails, acceptedPaymentMethod, hasMerchantReturnPolicy, aggregateRating) |
94
+ | 7 | Brand identity corroboration | 8 | sameAs to registry-grade sources (Wikidata, Crunchbase, OpenCorporates, Companies House, SEC EDGAR, GLEIF, plus LinkedIn/GitHub) |
95
+ | 8 | Source provenance & regulatory | 5 | dateModified/datePublished, security.txt, T&Cs, contact, privacy policy |
96
+
97
+ Full rubric, thresholds and rationale: **[docs/METHODOLOGY.md](./docs/METHODOLOGY.md)**.
98
+
99
+ ## Use in CI
100
+
101
+ The CLI exits non-zero when the score drops below 50:
102
+
103
+ ```yaml
104
+ # .github/workflows/agent-readiness.yml
105
+ - run: npx @kirkelabs/agent-readiness-scan https://staging.your-site.com
106
+ ```
107
+
108
+ ## Programmatic use
109
+
110
+ ```js
111
+ import { scan } from '@kirkelabs/agent-readiness-scan';
112
+
113
+ const result = await scan('https://your-site.com');
114
+ console.log(result.score, result.grade);
115
+ ```
116
+
117
+ ## Limitations (read this)
118
+
119
+ This tool measures **heuristic indicators** of agent-readiness. A high score makes a site easier for an AI agent to discover, declare access to, and act on — it is **not a guarantee** of agent uptake, citation, or transaction. The weights are informed by 2026 standards work (MCP, ACP, UCP, Web Bot Auth, Content Signals) but are judgement calls, documented openly in [docs/METHODOLOGY.md](./docs/METHODOLOGY.md). See also [`SECURITY.md`](./SECURITY.md).
120
+
121
+ Most of the dimensions check standards that are *emerging*, not universal. A v0.1.0 score below 50 is normal today; a score above 80 puts you among the earliest customs-house operators. The bar will rise.
122
+
123
+ ## Audit, recon, fix — three steps to lift your score
124
+
125
+ Once the scanner has graded your site, two prompt templates let Claude Code in your source repo do the rest:
126
+
127
+ 1. **[docs/RECON_PROMPT.md](./docs/RECON_PROMPT.md)** — read-only reconnaissance prompt that greps the codebase and returns a structured report of your framework, existing manifests, identity URLs, and routes.
128
+ 2. **[docs/PROMPT_TEMPLATE.md](./docs/PROMPT_TEMPLATE.md)** — the fix prompt. Fill in the placeholders informed by the recon, paste into a new Claude Code session to ship the customs declaration.
129
+
130
+ ## Companion tool
131
+
132
+ See also [`@kirkelabs/ai-legibility-scan`](https://github.com/KirkeLabs/ai-legibility-scan) — scores how legible your page is to AI *crawlers* (the layer below this one). Together they cover the audit-recon-fix loop for both halves of the customs-house thesis: legibility + declared access.
133
+
134
+ ## Contributing
135
+
136
+ Issues and PRs welcome — especially scoring false positives, new checks tracking emerging standards, and additional identity-registry coverage. See [CONTRIBUTING.md](./CONTRIBUTING.md) and the [Code of Conduct](./CODE_OF_CONDUCT.md).
137
+
138
+ ## Licence
139
+
140
+ [MIT](./LICENSE) © 2026 Kirke Labs — Soleman El Gelawi and Steve Kirton. A genuine gift to the community — attribution appreciated, not required.
141
+
142
+ — [www.kirkelabs.com](https://www.kirkelabs.com)
package/bin/cli.js ADDED
@@ -0,0 +1,172 @@
1
+ #!/usr/bin/env node
2
+ /**
3
+ * bin/cli.js — command-line entry point.
4
+ *
5
+ * Usage:
6
+ * npx @kirkelabs/agent-readiness-scan https://example.com
7
+ * npx @kirkelabs/agent-readiness-scan https://example.com --out ./report
8
+ * npx @kirkelabs/agent-readiness-scan https://example.com --json
9
+ */
10
+
11
+ import { writeFile, mkdir } from 'node:fs/promises';
12
+ import { resolve, join, dirname } from 'node:path';
13
+ import { scan } from '../src/index.js';
14
+ import { renderScorecard } from '../src/scorecard.js';
15
+
16
+ const RESET = '\x1b[0m';
17
+ const DIM = '\x1b[2m';
18
+ const BOLD = '\x1b[1m';
19
+ const GREEN = '\x1b[32m';
20
+ const YELLOW = '\x1b[33m';
21
+ const RED = '\x1b[31m';
22
+ const CYAN = '\x1b[36m';
23
+
24
+ function parseArgs(argv) {
25
+ const args = { url: null, out: './agent-readiness-out', json: false };
26
+ for (let i = 2; i < argv.length; i++) {
27
+ const a = argv[i];
28
+ if (a === '--out') args.out = argv[++i];
29
+ else if (a === '--json') args.json = true;
30
+ else if (a === '--help' || a === '-h') args.help = true;
31
+ else if (!a.startsWith('-')) args.url = a;
32
+ }
33
+ return args;
34
+ }
35
+
36
+ function help() {
37
+ console.log(`
38
+ ${BOLD}agent-readiness-scan${RESET} — is your brand ready for AI agents to act on it?
39
+
40
+ ${BOLD}Usage${RESET}
41
+ npx @kirkelabs/agent-readiness-scan <url> [options]
42
+
43
+ ${BOLD}Options${RESET}
44
+ --out <dir> Output directory (default: ./agent-readiness-out)
45
+ --json Print machine-readable JSON to stdout (good for CI)
46
+ -h, --help Show this help
47
+
48
+ ${BOLD}Outputs written to <dir>${RESET}
49
+ score.json Machine-readable result (CI-gateable)
50
+ report.md Human-readable report
51
+ scorecard.html Shareable static scorecard
52
+ customs-declaration/ Drop-in policy files
53
+ robots.txt
54
+ .well-known/security.txt
55
+ .well-known/mcp/server-card.json
56
+ .well-known/acp/manifest.json
57
+
58
+ MIT · Kirke Labs · www.kirkelabs.com
59
+ `);
60
+ }
61
+
62
+ function color(level) {
63
+ return level === 'pass'
64
+ ? GREEN
65
+ : level === 'warn'
66
+ ? YELLOW
67
+ : level === 'fail'
68
+ ? RED
69
+ : DIM;
70
+ }
71
+
72
+ function bar(pct, width = 22) {
73
+ const fill = Math.round((pct / 100) * width);
74
+ return '█'.repeat(fill) + '░'.repeat(width - fill);
75
+ }
76
+
77
+ async function writeArtefact(outDir, relPath, content) {
78
+ const fullPath = join(outDir, relPath);
79
+ await mkdir(dirname(fullPath), { recursive: true });
80
+ await writeFile(fullPath, content);
81
+ }
82
+
83
+ async function main() {
84
+ const args = parseArgs(process.argv);
85
+ if (args.help || !args.url) {
86
+ help();
87
+ process.exit(args.url ? 0 : 1);
88
+ }
89
+
90
+ let url = args.url;
91
+ if (!/^https?:\/\//i.test(url)) url = 'https://' + url;
92
+
93
+ if (!args.json) {
94
+ console.log(`\n${CYAN}⟶ Scanning ${BOLD}${url}${RESET}${CYAN} for agent-readiness…${RESET}\n`);
95
+ }
96
+
97
+ const result = await scan(url);
98
+
99
+ if (args.json) {
100
+ process.stdout.write(JSON.stringify(result, null, 2) + '\n');
101
+ return;
102
+ }
103
+
104
+ if (!result.ok) {
105
+ console.error(`${RED}✗ Could not scan: ${result.error}${RESET}\n`);
106
+ process.exit(2);
107
+ }
108
+
109
+ const gColor =
110
+ result.grade === 'A' || result.grade === 'B'
111
+ ? GREEN
112
+ : result.grade === 'C'
113
+ ? YELLOW
114
+ : RED;
115
+
116
+ console.log(
117
+ `${BOLD} Agent-Readiness Score: ${gColor}${result.score}/100 (${result.grade})${RESET}\n`,
118
+ );
119
+
120
+ for (const d of result.dimensions) {
121
+ const pct = Math.round((d.score / d.max) * 100);
122
+ const c = pct >= 70 ? GREEN : pct >= 40 ? YELLOW : RED;
123
+ console.log(
124
+ ` ${c}${bar(pct)}${RESET} ${d.title} ${DIM}(${d.score}/${d.max}, weight ${d.weight})${RESET}`,
125
+ );
126
+ for (const f of d.findings) {
127
+ console.log(` ${color(f.level)}•${RESET} ${f.msg}`);
128
+ }
129
+ console.log('');
130
+ }
131
+
132
+ // Write artefacts.
133
+ const outDir = resolve(args.out);
134
+ await mkdir(outDir, { recursive: true });
135
+ await writeFile(join(outDir, 'score.json'), JSON.stringify(result, null, 2));
136
+ await writeFile(join(outDir, 'report.md'), toMarkdown(result));
137
+ await writeFile(join(outDir, 'scorecard.html'), renderScorecard(result));
138
+
139
+ // Customs declaration files.
140
+ for (const [relPath, content] of Object.entries(result.generated)) {
141
+ await writeArtefact(join(outDir, 'customs-declaration'), relPath, content);
142
+ }
143
+
144
+ console.log(`${DIM} Artefacts written to ${outDir}/${RESET}`);
145
+ console.log(
146
+ `${DIM} score.json · report.md · scorecard.html · customs-declaration/${RESET}\n`,
147
+ );
148
+ console.log(
149
+ `${DIM} Heuristic indicators, not a guarantee of agent action. See docs/METHODOLOGY.md${RESET}\n`,
150
+ );
151
+
152
+ process.exit(result.score >= 50 ? 0 : 3);
153
+ }
154
+
155
+ function toMarkdown(r) {
156
+ let md = `# Agent-Readiness Report\n\n`;
157
+ md += `**URL:** ${r.url} \n**Score:** ${r.score}/100 (${r.grade}) \n`;
158
+ md += `**Scanned:** ${r.scannedAt}\n\n`;
159
+ md += `> Heuristic indicators of how ready this brand is for AI agents to discover, declare access to, and act on. Not a guarantee of agent uptake.\n\n`;
160
+ for (const d of r.dimensions) {
161
+ md += `## ${d.title} — ${d.score}/${d.max}\n\n_${d.why}_\n\n`;
162
+ for (const f of d.findings) md += `- **${f.level.toUpperCase()}** — ${f.msg}\n`;
163
+ md += `\n`;
164
+ }
165
+ md += `---\n\nGenerated by [\`@kirkelabs/agent-readiness-scan\`](https://github.com/KirkeLabs/agent-readiness-scan) — MIT. Built by Soleman El Gelawi (CTO, Kirke Labs), with Steve Kirton (www.kirkelabs.com) as a gift to the Algorand ecosystem.\n`;
166
+ return md;
167
+ }
168
+
169
+ main().catch((e) => {
170
+ console.error(`${RED}Unexpected error:${RESET}`, e);
171
+ process.exit(1);
172
+ });
package/package.json ADDED
@@ -0,0 +1,64 @@
1
+ {
2
+ "name": "@kirkelabs/agent-readiness-scan",
3
+ "version": "0.1.0",
4
+ "description": "Audit a website's customs-house posture for AI agents. Scores 8 dimensions — crawler policy, bot auth, MCP/ACP exposure, agent-actionable Product/Offer, brand identity corroboration, regulatory transparency — and generates a drop-in customs declaration (robots.txt + .well-known/ manifests). A gift to the Algorand ecosystem from Kirke Labs.",
5
+ "type": "module",
6
+ "bin": {
7
+ "agent-readiness-scan": "bin/cli.js"
8
+ },
9
+ "exports": {
10
+ ".": "./src/index.js"
11
+ },
12
+ "files": [
13
+ "bin/",
14
+ "src/",
15
+ "LICENSE",
16
+ "README.md",
17
+ "AUTHORS",
18
+ "CITATION.cff"
19
+ ],
20
+ "scripts": {
21
+ "scan": "node bin/cli.js",
22
+ "test": "node --test \"test/*.test.js\"",
23
+ "lint": "eslint . --ext .js",
24
+ "format": "prettier --write \"**/*.{js,json,md}\""
25
+ },
26
+ "keywords": [
27
+ "ai-agents",
28
+ "mcp",
29
+ "acp",
30
+ "agentic-commerce",
31
+ "crawler-policy",
32
+ "web-bot-auth",
33
+ "content-signals",
34
+ "universal-cart",
35
+ "schema-org",
36
+ "structured-data",
37
+ "algorand",
38
+ "cli",
39
+ "nodejs"
40
+ ],
41
+ "author": "Soleman El Gelawi <soleman@kirkelabs.com> (https://www.kirkelabs.com)",
42
+ "contributors": [
43
+ "Steve Kirton <steve@kirkelabs.com> (https://www.kirkelabs.com)"
44
+ ],
45
+ "license": "MIT",
46
+ "homepage": "https://kirkelabs.github.io/agent-readiness-scan/",
47
+ "repository": {
48
+ "type": "git",
49
+ "url": "git+https://github.com/KirkeLabs/agent-readiness-scan.git"
50
+ },
51
+ "bugs": {
52
+ "url": "https://github.com/KirkeLabs/agent-readiness-scan/issues"
53
+ },
54
+ "engines": {
55
+ "node": ">=20"
56
+ },
57
+ "dependencies": {
58
+ "cheerio": "^1.0.0"
59
+ },
60
+ "devDependencies": {
61
+ "eslint": "^9.0.0",
62
+ "prettier": "^3.0.0"
63
+ }
64
+ }
@@ -0,0 +1,124 @@
1
+ /**
2
+ * check 01 — Per-bot crawler policy
3
+ *
4
+ * The "customs house" foundation. Does robots.txt name the major AI bots
5
+ * individually with explicit allow/disallow rules, or is the site silent
6
+ * (default-permissive) about them? The article's central argument is that
7
+ * the new web bargain is declared-role-for-access — and the first
8
+ * declaration is your robots.txt directive per bot UA.
9
+ *
10
+ * Categories:
11
+ * training — GPTBot, ClaudeBot, Google-Extended, anthropic-ai,
12
+ * CCBot, Bytespider, Amazonbot, Applebot-Extended,
13
+ * meta-externalagent
14
+ * grounding — OAI-SearchBot, PerplexityBot, Claude-Web
15
+ * user-directed — ChatGPT-User, Claude-User
16
+ */
17
+
18
+ export const meta = {
19
+ id: 'per-bot-policy',
20
+ title: 'Per-bot crawler policy',
21
+ weight: 10,
22
+ why: 'robots.txt is the customs declaration. A site that names individual AI bots with explicit rules is bargaining; one that is silent is default-permissive on every front.',
23
+ };
24
+
25
+ const TRAINING_BOTS = [
26
+ 'GPTBot',
27
+ 'ClaudeBot',
28
+ 'Google-Extended',
29
+ 'anthropic-ai',
30
+ 'CCBot',
31
+ 'Bytespider',
32
+ 'Amazonbot',
33
+ 'Applebot-Extended',
34
+ 'meta-externalagent',
35
+ ];
36
+ const GROUNDING_BOTS = ['OAI-SearchBot', 'PerplexityBot', 'Claude-Web'];
37
+ const USER_DIRECTED_BOTS = ['ChatGPT-User', 'Claude-User'];
38
+
39
+ export function run({ robotsTxt }) {
40
+ const findings = [];
41
+
42
+ if (!robotsTxt) {
43
+ findings.push({
44
+ level: 'fail',
45
+ msg: 'No robots.txt found. Every AI bot is implicitly allowed — the site has not declared any access posture.',
46
+ });
47
+ return { score: 0, max: 10, findings, detail: { named: [] } };
48
+ }
49
+
50
+ // Find every User-agent block name (case-insensitive matching against bot list).
51
+ const named = new Set();
52
+ const lines = robotsTxt.split(/\r?\n/);
53
+ for (const line of lines) {
54
+ const m = /^\s*user-agent\s*:\s*(.+?)\s*$/i.exec(line);
55
+ if (!m) continue;
56
+ const ua = m[1].trim();
57
+ if (ua === '*') continue;
58
+ for (const bot of [...TRAINING_BOTS, ...GROUNDING_BOTS, ...USER_DIRECTED_BOTS]) {
59
+ if (ua.toLowerCase() === bot.toLowerCase()) named.add(bot);
60
+ }
61
+ }
62
+
63
+ const namedTraining = TRAINING_BOTS.filter((b) => named.has(b));
64
+ const namedGrounding = GROUNDING_BOTS.filter((b) => named.has(b));
65
+ const namedUserDirected = USER_DIRECTED_BOTS.filter((b) => named.has(b));
66
+ const totalNamed = named.size;
67
+
68
+ // Score by breadth + category coverage.
69
+ let score;
70
+ if (totalNamed === 0) {
71
+ score = 3;
72
+ findings.push({
73
+ level: 'warn',
74
+ msg: 'robots.txt exists but names no AI bots individually. To an AI customs officer this looks like a default-permissive port.',
75
+ });
76
+ } else if (totalNamed <= 3) {
77
+ score = 6;
78
+ findings.push({
79
+ level: 'warn',
80
+ msg: `${totalNamed} AI bot(s) named explicitly (${[...named].join(', ')}). A start, but coverage of the major training + grounding + user-directed crawlers is thin.`,
81
+ });
82
+ } else if (totalNamed <= 7) {
83
+ score = 8;
84
+ findings.push({
85
+ level: 'pass',
86
+ msg: `${totalNamed} AI bots named explicitly. Good coverage breadth.`,
87
+ });
88
+ } else {
89
+ score = 10;
90
+ findings.push({
91
+ level: 'pass',
92
+ msg: `${totalNamed} AI bots named explicitly — comprehensive customs declaration.`,
93
+ });
94
+ }
95
+
96
+ // Bonus / penalty for category coverage.
97
+ const categories = [
98
+ namedTraining.length > 0,
99
+ namedGrounding.length > 0,
100
+ namedUserDirected.length > 0,
101
+ ].filter(Boolean).length;
102
+ if (totalNamed > 0 && categories < 2) {
103
+ findings.push({
104
+ level: 'warn',
105
+ msg: 'Rules cover only one of the three AI-bot categories (training / grounding / user-directed). Distinguish between them — a bot training a foundation model is not the same kind of visitor as one fetching on a user\'s behalf.',
106
+ });
107
+ score = Math.max(score - 1, 3);
108
+ } else if (categories === 3) {
109
+ findings.push({
110
+ level: 'pass',
111
+ msg: 'Rules cover training, grounding, and user-directed bots — the three customs declarations a 2026 AI port expects.',
112
+ });
113
+ }
114
+
115
+ return {
116
+ score,
117
+ max: 10,
118
+ findings,
119
+ detail: {
120
+ named: [...named],
121
+ categoryCoverage: { training: namedTraining, grounding: namedGrounding, userDirected: namedUserDirected },
122
+ },
123
+ };
124
+ }
@@ -0,0 +1,93 @@
1
+ /**
2
+ * check 02 — Declared use-policy signals
3
+ *
4
+ * Beyond robots.txt allow/disallow, has the site declared *why* — i.e.
5
+ * which uses of the content are permitted? Cloudflare Content Signals
6
+ * separates search / ai-input / ai-train. The noai and noimageai meta
7
+ * tags / X-Robots-Tag headers signal the same intent at the page level.
8
+ *
9
+ * Default-permissive silence is the failure mode this check catches.
10
+ */
11
+
12
+ export const meta = {
13
+ id: 'declared-use-signals',
14
+ title: 'Declared use-policy signals',
15
+ weight: 7,
16
+ why: 'Allow-or-disallow alone is binary. Real bargaining means saying which uses (search, ai-input, ai-train) you permit. Cloudflare Content Signals + noai meta tags + X-Robots-Tag turn silence into a declaration.',
17
+ };
18
+
19
+ export function run({ $, robotsTxt, headers }) {
20
+ const findings = [];
21
+ let score = 0;
22
+ const detail = {
23
+ contentSignal: { found: false, parts: [] },
24
+ noaiMeta: false,
25
+ noimageaiMeta: false,
26
+ xRobotsTagNoai: false,
27
+ };
28
+
29
+ // 1. Cloudflare Content Signals in robots.txt
30
+ if (robotsTxt) {
31
+ const csMatch = /^\s*content-signal\s*:\s*(.+)$/im.exec(robotsTxt);
32
+ if (csMatch) {
33
+ detail.contentSignal.found = true;
34
+ const parts = csMatch[1].split(',').map((p) => p.trim()).filter(Boolean);
35
+ detail.contentSignal.parts = parts;
36
+ score += 4;
37
+ findings.push({
38
+ level: 'pass',
39
+ msg: `Cloudflare Content Signals present: \`${csMatch[1].trim()}\`. Permitted-use declaration is on the record.`,
40
+ });
41
+ const declares = ['search', 'ai-input', 'ai-train'].filter((k) =>
42
+ parts.some((p) => p.startsWith(k + '=')),
43
+ );
44
+ if (declares.length === 3) {
45
+ score += 1;
46
+ findings.push({
47
+ level: 'pass',
48
+ msg: 'All three declarations present (search, ai-input, ai-train) — full triage.',
49
+ });
50
+ } else if (declares.length > 0) {
51
+ findings.push({
52
+ level: 'warn',
53
+ msg: `Content-Signal declares ${declares.length} of 3 signals (${declares.join(', ')}). The unstated signals default to permissive.`,
54
+ });
55
+ }
56
+ }
57
+ }
58
+
59
+ // 2. noai / noimageai meta tags
60
+ const robotsMeta = ($('meta[name="robots"]').attr('content') || '').toLowerCase();
61
+ if (/\bnoai\b/.test(robotsMeta)) {
62
+ detail.noaiMeta = true;
63
+ score += 1;
64
+ findings.push({ level: 'pass', msg: '`<meta name="robots" content="…noai…">` present — page declares no-AI-training.' });
65
+ }
66
+ if (/\bnoimageai\b/.test(robotsMeta)) {
67
+ detail.noimageaiMeta = true;
68
+ score += 1;
69
+ findings.push({ level: 'pass', msg: '`<meta name="robots" content="…noimageai…">` present.' });
70
+ }
71
+
72
+ // 3. X-Robots-Tag HTTP header
73
+ const xrt = (headers?.['x-robots-tag'] || '').toLowerCase();
74
+ if (/\bnoai\b/.test(xrt) || /\bnoimageai\b/.test(xrt)) {
75
+ detail.xRobotsTagNoai = true;
76
+ score += 1;
77
+ findings.push({
78
+ level: 'pass',
79
+ msg: `\`X-Robots-Tag\` header carries noai/noimageai directive(s) (\`${xrt}\`).`,
80
+ });
81
+ }
82
+
83
+ score = Math.min(score, 7);
84
+
85
+ if (score === 0) {
86
+ findings.push({
87
+ level: 'fail',
88
+ msg: 'No declared use-policy signals found. Add Cloudflare Content Signals to robots.txt, or `<meta name="robots" content="noai, noimageai">`, or an `X-Robots-Tag: noai` header — choose your declaration.',
89
+ });
90
+ }
91
+
92
+ return { score, max: 7, findings, detail };
93
+ }