ax-audit 3.0.0 → 3.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (65) hide show
  1. package/CHANGELOG.md +76 -0
  2. package/README.md +61 -221
  3. package/dist/checks/agent-access.d.ts +16 -0
  4. package/dist/checks/agent-access.d.ts.map +1 -0
  5. package/dist/checks/agent-access.js +110 -0
  6. package/dist/checks/agent-access.js.map +1 -0
  7. package/dist/checks/content-negotiation.d.ts +4 -0
  8. package/dist/checks/content-negotiation.d.ts.map +1 -0
  9. package/dist/checks/content-negotiation.js +138 -0
  10. package/dist/checks/content-negotiation.js.map +1 -0
  11. package/dist/checks/crawl-efficiency.d.ts +4 -0
  12. package/dist/checks/crawl-efficiency.d.ts.map +1 -0
  13. package/dist/checks/crawl-efficiency.js +122 -0
  14. package/dist/checks/crawl-efficiency.js.map +1 -0
  15. package/dist/checks/index.d.ts.map +1 -1
  16. package/dist/checks/index.js +8 -0
  17. package/dist/checks/index.js.map +1 -1
  18. package/dist/checks/robots-txt.d.ts +20 -0
  19. package/dist/checks/robots-txt.d.ts.map +1 -1
  20. package/dist/checks/robots-txt.js +111 -3
  21. package/dist/checks/robots-txt.js.map +1 -1
  22. package/dist/checks/rsl.d.ts +6 -0
  23. package/dist/checks/rsl.d.ts.map +1 -0
  24. package/dist/checks/rsl.js +252 -0
  25. package/dist/checks/rsl.js.map +1 -0
  26. package/dist/cli.d.ts.map +1 -1
  27. package/dist/cli.js +20 -2
  28. package/dist/cli.js.map +1 -1
  29. package/dist/constants.d.ts +17 -0
  30. package/dist/constants.d.ts.map +1 -1
  31. package/dist/constants.js +42 -1
  32. package/dist/constants.js.map +1 -1
  33. package/dist/fetcher.d.ts +7 -3
  34. package/dist/fetcher.d.ts.map +1 -1
  35. package/dist/fetcher.js +68 -30
  36. package/dist/fetcher.js.map +1 -1
  37. package/dist/index.d.ts +2 -1
  38. package/dist/index.d.ts.map +1 -1
  39. package/dist/index.js +1 -0
  40. package/dist/index.js.map +1 -1
  41. package/dist/orchestrator.d.ts +2 -2
  42. package/dist/orchestrator.d.ts.map +1 -1
  43. package/dist/orchestrator.js +13 -6
  44. package/dist/orchestrator.js.map +1 -1
  45. package/dist/reporter/index.d.ts.map +1 -1
  46. package/dist/reporter/index.js +7 -0
  47. package/dist/reporter/index.js.map +1 -1
  48. package/dist/reporter/markdown.d.ts +8 -0
  49. package/dist/reporter/markdown.d.ts.map +1 -0
  50. package/dist/reporter/markdown.js +76 -0
  51. package/dist/reporter/markdown.js.map +1 -0
  52. package/dist/scorer.d.ts.map +1 -1
  53. package/dist/scorer.js +8 -0
  54. package/dist/scorer.js.map +1 -1
  55. package/dist/types.d.ts +17 -2
  56. package/dist/types.d.ts.map +1 -1
  57. package/docs/api.md +200 -0
  58. package/docs/architecture.md +88 -0
  59. package/docs/checks.md +322 -0
  60. package/docs/ci.md +89 -0
  61. package/docs/cli.md +67 -0
  62. package/docs/concepts.md +87 -0
  63. package/docs/faq.md +77 -0
  64. package/docs/getting-started.md +101 -0
  65. package/package.json +2 -1
@@ -0,0 +1,88 @@
1
+ # Architecture
2
+
3
+ ax-audit is a dependency-light TypeScript codebase: two runtime dependencies (`chalk`, `commander`), Node 18+ built-in `fetch`, no HTTP libraries, no XML/HTML parser dependencies (regex-based primitives), and the built-in `node:test` runner.
4
+
5
+ ## Pipeline
6
+
7
+ ```
8
+ cli.ts ──► orchestrator.ts ──► checks/* (Promise.allSettled, parallel)
9
+ │ │
10
+ ▼ ▼
11
+ fetcher.ts scorer.ts ──► reporter/{terminal,json,html,markdown}
12
+ (cache + retries) │
13
+ ▲ baseline.ts (save / load / diff)
14
+ └── shared by every check via CheckContext.fetch
15
+ ```
16
+
17
+ 1. **cli.ts** parses and validates flags, loads the baseline if requested, and dispatches to single or batch mode.
18
+ 2. **orchestrator.ts** (`audit`) creates one fetcher per run, fetches the homepage once, builds the `CheckContext` (`url`, `html`, `headers`, `fetch`), and runs all selected checks in parallel. A check that throws is converted into a score-0 result with the error as a finding — one bad check never kills the audit. `batchAudit` runs `audit` per URL through an order-preserving work queue with configurable `concurrency`.
19
+ 3. **fetcher.ts** wraps `fetch` with: per-run in-memory caching keyed on URL + normalized (lowercased, sorted) custom headers — mirroring HTTP `Vary` semantics so a `text/markdown` probe never collides with the HTML fetch; case-insensitive header merging over defaults; timeouts via `AbortController`; and retries with exponential backoff for transient failures (status 0, 408, 425, 429, 5xx). Errors never throw — they become `{ status: 0, ok: false, error }` results, also cached.
20
+ 4. **checks/** — one module per check (18). Each exports `default` (async check function) and `meta` (`{ id, name, description, weight }`).
21
+ 5. **scorer.ts** computes the weighted average; when all selected checks have weight 0 it falls back to a plain average.
22
+ 6. **reporter/** renders to terminal (chalk), JSON, self-contained HTML, or Markdown.
23
+ 7. **baseline.ts** persists minimal score snapshots and computes per-check diffs for regression gating.
24
+
25
+ ## Anatomy of a check
26
+
27
+ ```typescript
28
+ import { guideUrl } from '../guide-urls.js';
29
+ import type { CheckContext, CheckResult, CheckMeta, Finding } from '../types.js';
30
+ import { buildResult } from './utils.js';
31
+
32
+ export const meta: CheckMeta = {
33
+ id: 'my-check',
34
+ name: 'My Check',
35
+ description: 'One-line description shown in reports',
36
+ weight: 0, // new checks start informational in 3.x
37
+ };
38
+
39
+ export default async function check(ctx: CheckContext): Promise<CheckResult> {
40
+ const start = performance.now();
41
+ const findings: Finding[] = [];
42
+ let score = 100;
43
+
44
+ const res = await ctx.fetch(`${ctx.url}/something`, { headers: { Accept: 'application/json' } });
45
+ if (!res.ok) {
46
+ findings.push({
47
+ status: 'fail',
48
+ message: '/something not found',
49
+ hint: 'Actionable, copy-pasteable advice.',
50
+ learnMoreUrl: guideUrl(meta.id, 'not-found'),
51
+ });
52
+ return buildResult(meta, 0, findings, start);
53
+ }
54
+
55
+ // ... validations, each pushing a pass/warn/fail Finding and adjusting score
56
+
57
+ return buildResult(meta, score, findings, start);
58
+ }
59
+ ```
60
+
61
+ Conventions:
62
+
63
+ - **Findings are actionable.** Every `warn`/`fail` carries a `hint` with concrete remediation and a `learnMoreUrl` pointing to `lucioduran.com/projects/ax-audit/guides/<check-id>#<anchor>`. Every anchor must have a section in that guide.
64
+ - **Scores are clamped** to [0, 100] by `buildResult`.
65
+ - **Shared HTML primitives** live in `checks/html-utils.ts` (`getMetaContent`, `findLinkTags`, `getAttribute`, `extractVisibleText`, …) — no per-check regex duplication.
66
+ - **Content-Type validation** uses `checkContentType` from `checks/utils.ts` (−5 convention for mismatches).
67
+ - **Network goes through `ctx.fetch`** — never raw `fetch` — so caching, retries, timeouts, and `--verbose` logging apply uniformly.
68
+
69
+ ## Adding a new check
70
+
71
+ 1. Create `src/checks/your-check.ts` exporting `default` + `meta` (weight 0 — see scoring policy below).
72
+ 2. Register it in `src/checks/index.ts`.
73
+ 3. Add its weight to `CHECK_WEIGHTS` in `src/constants.ts`.
74
+ 4. Add a test suite in `test/checks/your-check.test.js` using `mockContext` / `mockResponse` from `test/helpers.js`. Route values can be functions `(url, fetchOptions) => response` when the response must vary by request headers.
75
+ 5. Document it in `docs/checks.md` and the README table.
76
+ 6. Write the remediation guide covering every `learnMoreUrl` anchor you emit.
77
+
78
+ ## Scoring policy (3.x)
79
+
80
+ Score deltas on the same site are treated as **breaking** (see CHANGELOG 3.0.0). Therefore in 3.x:
81
+
82
+ - New checks ship with **weight 0** (informational): full findings, no effect on the overall score or baselines.
83
+ - New findings inside weighted checks must be informational (no score deduction) — see the Content Signals findings in `robots-txt`.
84
+ - Weight redistribution happens in major versions (v4.0).
85
+
86
+ ## Testing
87
+
88
+ `npm test` builds (`tsc`) and runs `node --test`. The suite (301 tests) covers every check, the scorer, baseline logic, the Markdown reporter, plus integration tests that spin up real local HTTP servers for the fetcher (per-header caching, retries) and the batch orchestrator (ordering, concurrency caps). No test dependencies beyond Node.
package/docs/checks.md ADDED
@@ -0,0 +1,322 @@
1
+ # Checks Reference
2
+
3
+ ax-audit runs 18 checks. Fourteen are **weighted** (summing to 100% of the overall score); four are **informational** in 3.x — they run and report findings but carry weight 0 until v4.0, because score-affecting changes are treated as breaking (see [CHANGELOG 3.0.0](../CHANGELOG.md)).
4
+
5
+ This page documents the **exact scoring** of every check: each deduction, bonus, and formula, extracted from the source. Every finding links to a step-by-step remediation guide at `lucioduran.com/projects/ax-audit/guides/<check-id>`.
6
+
7
+ **Reading the tables:** each check starts at 100 unless noted. Deductions stack additively; `buildResult` clamps the final score to [0, 100]. "Hard fail" rows short-circuit the check.
8
+
9
+ ---
10
+
11
+ ## Weighted checks
12
+
13
+ ### `llms-txt` — 11%
14
+
15
+ `/llms.txt` presence and [llmstxt.org](https://llmstxt.org) spec compliance.
16
+
17
+ | Condition | Points |
18
+ | --- | --- |
19
+ | `/llms.txt` not found | **hard fail → 0** |
20
+ | Wrong Content-Type (expected `text/plain` or `text/markdown`) | −5 |
21
+ | First line is not an H1 (`# `) | −15 |
22
+ | No blockquote description (`> `) | −10 |
23
+ | No `##` section headings | −10 |
24
+ | No Markdown links | −10 |
25
+ | Content under 100 characters | −10 |
26
+ | `/llms-full.txt` also available | **+10** (capped at 100) |
27
+
28
+ ### `robots-txt` — 11%
29
+
30
+ AI-crawler configuration. Core crawlers: GPTBot, ClaudeBot, ChatGPT-User, Claude-SearchBot, Google-Extended, PerplexityBot, OAI-SearchBot, CCBot.
31
+
32
+ | Condition | Points |
33
+ | --- | --- |
34
+ | `/robots.txt` not found | **hard fail → 0** |
35
+ | No core AI crawler explicitly configured | −40 |
36
+ | Some core crawlers missing | −`round(missing/8 × 30)` |
37
+ | Core crawler(s) blocked only via `User-agent: *` + `Disallow: /` | −5 per crawler |
38
+ | Known AI crawler(s) explicitly blocked (`Disallow: /`) | −3 per crawler |
39
+ | No `Sitemap:` directive | −5 |
40
+ | Partial path restrictions on AI crawlers | warn only, 0 |
41
+ | [Content Signals](https://contentsignals.org) findings (declared / malformed / unknown / missing) | informational, 0 in 3.x |
42
+
43
+ ### `html-rendering` — 9%
44
+
45
+ Whether the static HTML contains content — most AI crawlers do not execute JavaScript. Thresholds: 500 chars / 80 words of visible text, 5% text-to-markup ratio.
46
+
47
+ | Condition | Points |
48
+ | --- | --- |
49
+ | No HTML body returned | **hard fail → 0** |
50
+ | Zero visible text in static HTML | −50 |
51
+ | Sparse content (< 500 chars or < 80 words) | −25 |
52
+ | Text-to-markup ratio < 5% | −10 |
53
+ | Empty SPA mount point (`#root`, `#__next`, `#__nuxt`, `#app`, `#svelte`, `#gatsby`) | −20 |
54
+ | 0 semantic landmarks (`<main>`, `<article>`, `<header>`, `<footer>`, `<nav>`) | −15 |
55
+ | 1–2 semantic landmarks | −10 |
56
+ | No `<h1>` | −10 |
57
+ | Multiple or empty `<h1>` | −5 |
58
+ | > 15 executable scripts without `<noscript>` fallback | −5 |
59
+ | `<img alt>` coverage < 90% | −5 |
60
+
61
+ ### `structured-data` — 9%
62
+
63
+ JSON-LD on the homepage. Key entity types: Person, Organization, WebSite, WebPage, ProfilePage.
64
+
65
+ | Condition | Points |
66
+ | --- | --- |
67
+ | No JSON-LD blocks | **hard fail → 0** |
68
+ | Every JSON-LD block has invalid JSON | **→ 10** |
69
+ | Invalid JSON in a block | −10 per block |
70
+ | No schema.org `@context` | −15 |
71
+ | No key entity types found | −15 |
72
+ | Only one key entity type | −10 |
73
+ | No `@graph` array | −5 |
74
+ | No `BreadcrumbList` | −5 |
75
+
76
+ ### `http-headers` — 9%
77
+
78
+ Security headers, AI discovery `Link` headers (RFC 5988-parsed), CORS on `.well-known`.
79
+
80
+ | Condition | Points |
81
+ | --- | --- |
82
+ | No headers retrievable | **hard fail → 0** |
83
+ | Missing critical security header (HSTS, X-Content-Type-Options) | −10 each |
84
+ | Only 1–3 of the 7 tracked security headers present | −5 |
85
+ | `Link` header missing both llms.txt and agent.json references | −15 |
86
+ | `Link` header missing one of the two | −5 |
87
+ | No CORS on `/.well-known/agent.json` | −10 |
88
+
89
+ ### `agent-json` — 7%
90
+
91
+ `/.well-known/agent.json` [A2A Agent Card](https://a2a-protocol.org). Required fields: `name`, `description`, `url`, `skills`.
92
+
93
+ | Condition | Points |
94
+ | --- | --- |
95
+ | Not found | **hard fail → 0** |
96
+ | Invalid JSON | **→ 10** |
97
+ | Wrong Content-Type (expected `application/json`) | −5 |
98
+ | Missing required field | −15 per field |
99
+ | `url` on a different origin | −5 |
100
+ | `url` not an absolute URL | −5 |
101
+ | `skills` empty | −10 |
102
+ | `skills` entries missing `id` or `description` | −5 |
103
+ | No `protocolVersion` | −5 |
104
+ | No optional fields (`capabilities`, `authentication`, `documentationUrl`) | −5 |
105
+
106
+ ### `mcp` — 7%
107
+
108
+ `/.well-known/mcp.json` [Model Context Protocol](https://modelcontextprotocol.io) server configuration.
109
+
110
+ | Condition | Points |
111
+ | --- | --- |
112
+ | Not found | **hard fail → 0** |
113
+ | Invalid JSON | **→ 10** |
114
+ | Wrong Content-Type | −5 |
115
+ | Missing `name` | −10 |
116
+ | Missing `description` | −5 |
117
+ | No `tools` array, or empty | −15 |
118
+ | No tool has a description | −10 |
119
+ | Some tools missing descriptions | −5 |
120
+ | No `resources` | −5 |
121
+ | No protocol version | −5 |
122
+ | No CORS headers | −10 |
123
+
124
+ ### `seo-basics` — 7%
125
+
126
+ Head-tag fundamentals. Bounds: title 20–70 chars, description 70–160.
127
+
128
+ | Condition | Points |
129
+ | --- | --- |
130
+ | Homepage HTML unavailable | **hard fail → 0** |
131
+ | `<title>` missing or empty | −25 |
132
+ | Title too short / too long | −10 / −5 |
133
+ | Meta description missing | −20 |
134
+ | Description too short / too long | −8 / −5 |
135
+ | Description duplicates the title | −5 |
136
+ | No canonical link | −10 |
137
+ | Multiple canonicals / missing href / relative href | −5 each |
138
+ | `<html lang>` missing / invalid BCP 47 | −10 / −5 |
139
+ | No UTF-8 charset | −5 |
140
+ | Missing viewport | −5 |
141
+ | hreflang present without `x-default` | −3 |
142
+
143
+ ### `security-txt` — 6%
144
+
145
+ `/.well-known/security.txt` per [RFC 9116](https://www.rfc-editor.org/rfc/rfc9116).
146
+
147
+ | Condition | Points |
148
+ | --- | --- |
149
+ | Not found | **hard fail → 0** |
150
+ | Missing `Contact` or `Expires` | −25 per field |
151
+ | `Expires` in the past | −20 |
152
+ | No optional fields (Canonical, Preferred-Languages, Policy, Encryption, Hiring) | −5 |
153
+
154
+ ### `meta-tags` — 6%
155
+
156
+ AI meta tags (`ai:summary`, `ai:content_type`, `ai:author`, `ai:api`, `ai:agent_card`), discovery links, Open Graph, Twitter Card.
157
+
158
+ | Condition | Points |
159
+ | --- | --- |
160
+ | Homepage HTML unavailable | **hard fail → 0** |
161
+ | 0 AI meta tags | −18 |
162
+ | Only 1–2 AI meta tags | −12 |
163
+ | No `rel="alternate"` → llms.txt | −12 |
164
+ | No `rel="alternate"` → agent.json | −8 |
165
+ | No `rel="me"` identity links | −8 |
166
+ | No Open Graph tags at all | −12 |
167
+ | OG required incomplete (`og:title`, `og:description`, `og:url`, `og:type`) | −8 |
168
+ | OG recommended incomplete (`og:image`, `og:site_name`) | −3 |
169
+ | No Twitter Card tags at all | −6 |
170
+ | Twitter required incomplete (`twitter:card`, `twitter:title`, `twitter:description`) | −5 |
171
+ | Twitter recommended incomplete (`twitter:image`) | −2 |
172
+
173
+ ### `openapi` — 6%
174
+
175
+ `/.well-known/openapi.json`.
176
+
177
+ | Condition | Points |
178
+ | --- | --- |
179
+ | Not found | **hard fail → 0** |
180
+ | Invalid JSON | **→ 10** |
181
+ | Wrong Content-Type | −5 |
182
+ | No `openapi`/`swagger` version field | −20 |
183
+ | Swagger 2.x instead of OpenAPI 3.x | −10 |
184
+ | Missing `info.title` | −10 |
185
+ | Missing `info.description` | −5 |
186
+ | No `paths` documented | −15 |
187
+ | No `servers` | −5 |
188
+
189
+ ### `tls-https` — 5%
190
+
191
+ HTTPS, redirect, HSTS. Thresholds: max-age ≥ 15,768,000s (~6 months), preload ≥ 31,536,000s (1 year).
192
+
193
+ | Condition | Points |
194
+ | --- | --- |
195
+ | Invalid URL | **hard fail → 0** |
196
+ | Served over plain HTTP | −50 |
197
+ | HTTP does not redirect to HTTPS | −15 |
198
+ | Redirect unverifiable | −5 |
199
+ | No HSTS header | −15 |
200
+ | HSTS without `max-age` | −10 |
201
+ | `max-age` < 6 months | −5 |
202
+ | No `includeSubDomains` | −5 |
203
+ | `preload` present but ineligible | −5 |
204
+ | No `preload` directive | −3 |
205
+
206
+ ### `sitemap` — 4%
207
+
208
+ Located via robots.txt `Sitemap:` or `/sitemap.xml`. Limits: 50,000 URLs / 50 MB / 365-day freshness.
209
+
210
+ | Condition | Points |
211
+ | --- | --- |
212
+ | No sitemap found | **hard fail → 0** |
213
+ | Response is not XML | **→ 20** |
214
+ | Over 50 MB | −10 |
215
+ | Unexpected Content-Type | −5 |
216
+ | Sitemap index with no `<sitemap>` entries | −20, stop |
217
+ | Some sampled child sitemaps unreachable | −10 |
218
+ | `<urlset>` with no `<url>` entries | −30 |
219
+ | Over 50,000 URLs declared | −10 |
220
+ | `<lastmod>` coverage < 50% | −5 |
221
+ | Newest `<lastmod>` older than 365 days | −5 |
222
+
223
+ ### `well-known-ai` — 3%
224
+
225
+ Emerging AI discovery files. **Purely proportional** — no deductions:
226
+
227
+ ```
228
+ score = round(present / 5 × 100)
229
+ ```
230
+
231
+ over `/.well-known/ai.txt` (Spawning), `/.well-known/genai.txt`, `/ai-plugin.json`, `/agents.json`, `/.well-known/nlweb.json`. Files with invalid content produce warnings without counting as present.
232
+
233
+ ---
234
+
235
+ ## Informational checks (weight 0 in 3.x)
236
+
237
+ These run on every audit and report full findings, but do not affect the overall score or baselines. They gain weight in v4.0.
238
+
239
+ ### `content-negotiation` — Markdown for Agents
240
+
241
+ Probes the homepage with `Accept: text/markdown` — the pattern served by Cloudflare and Vercel and requested by Claude Code, Cursor, and OpenCode (~80% token reduction vs HTML).
242
+
243
+ | Condition | Points |
244
+ | --- | --- |
245
+ | Probe request fails (network) | **hard fail → 0** |
246
+ | No Markdown served, no fallback | **→ 0** |
247
+ | No Markdown served, but `<link rel="alternate" type="text/markdown">` present | **→ 40** |
248
+ | Markdown served (correct Content-Type, 2xx) | base 100 |
249
+ | Body is empty | −30 |
250
+ | Body is a relabeled HTML document | −25 |
251
+ | `Vary` does not include `Accept` | −15 |
252
+ | Markdown not smaller than HTML | warn only, 0 |
253
+
254
+ ### `rsl` — Really Simple Licensing
255
+
256
+ [RSL 1.0](https://rslstandard.org/rsl) discovery (robots.txt `License:`, `Link: rel="license"` header, `<link rel="license" type="application/rsl+xml">`) and document validation. Plain CC-style license links without the RSL media type are ignored.
257
+
258
+ | Condition | Points |
259
+ | --- | --- |
260
+ | No discovery mechanism found | **hard fail → 0** |
261
+ | License document unreachable | **→ 25** (cap) |
262
+ | Root `<rsl>` element missing | −40, stop |
263
+ | No `<content>` elements | −20, stop |
264
+ | Wrong or missing `https://rslstandard.org/rsl` namespace | −15 |
265
+ | `<license>` elements missing | −15 |
266
+ | robots.txt `License:` not an absolute URI | −10 |
267
+ | `<content>` missing required `url` attribute | −10 |
268
+ | Wrong Content-Type (expected `application/rsl+xml`) | −5 |
269
+ | `permits`/`prohibits` with invalid `type` | −5 |
270
+ | Tokens outside the RSL 1.0 vocabulary (incl. pre-1.0 draft tokens) | −5 |
271
+ | Invalid `payment` type | −5 |
272
+
273
+ ### `agent-access` — Cloaking detection
274
+
275
+ Probes the homepage with realistic UAs for the 8 core AI crawlers and compares status + visible text against the default-UA baseline. **Credit-ratio formula:**
276
+
277
+ ```
278
+ score = round(credit / 8 × 100)
279
+ ```
280
+
281
+ | Outcome per crawler | Credit |
282
+ | --- | --- |
283
+ | Equivalent response | 1 |
284
+ | Blocked, consistent with robots.txt `Disallow` (explicit or wildcard) | 1 |
285
+ | 200 but < 50% of baseline visible text (baseline ≥ 200 chars) | 0.5 |
286
+ | Blocked while robots.txt allows (or doesn't restrict) it | 0 |
287
+ | Baseline request itself fails | **hard fail → 0** |
288
+
289
+ Caveat: WAFs using Web Bot Auth / IP verification may pass the real crawler while rejecting this unverified probe — confirm against WAF logs before changing rules.
290
+
291
+ ### `crawl-efficiency`
292
+
293
+ | Condition | Points |
294
+ | --- | --- |
295
+ | Homepage request fails | **hard fail → 0** |
296
+ | Uncompressed response | −30 |
297
+ | gzip/deflate/zstd instead of Brotli | pass with suggestion, 0 |
298
+ | No `ETag` / `Last-Modified` validator | −30 |
299
+ | Validator present but conditional request not answered with `304` | −15 |
300
+ | Page > 2 MB decompressed | −10 |
301
+ | Page > 500 KB decompressed | −5 |
302
+
303
+ ---
304
+
305
+ ## Overall scoring model
306
+
307
+ Each check returns 0–100. The overall score is the weighted average across the checks that ran:
308
+
309
+ ```
310
+ overall = round( Σ (score_i / 100 × weight_i) / Σ weight_i × 100 )
311
+ ```
312
+
313
+ When every selected check has weight 0 (e.g. `--checks rsl`), the overall falls back to a plain average of check scores.
314
+
315
+ | Grade | Score | Exit code |
316
+ | --- | --- | --- |
317
+ | Excellent | 90–100 | 0 |
318
+ | Good | 70–89 | 0 |
319
+ | Fair | 50–69 | 1 |
320
+ | Poor | 0–49 | 1 |
321
+
322
+ Weights live in `src/constants.ts` (`CHECK_WEIGHTS`); a check's own `meta.weight` takes precedence. The scoring policy for 3.x — why new checks ship at weight 0 — is documented in [architecture.md](./architecture.md).
package/docs/ci.md ADDED
@@ -0,0 +1,89 @@
1
+ # CI Integration
2
+
3
+ ax-audit's exit codes (see [cli.md](./cli.md)) make it a drop-in quality gate: `0` for Good/Excellent, `1` for Fair/Poor or regressions.
4
+
5
+ ## GitHub Actions
6
+
7
+ ### Basic gate
8
+
9
+ ```yaml
10
+ - name: AX Audit
11
+ run: npx ax-audit https://your-site.com
12
+ # Fails the step if the score < 70
13
+ ```
14
+
15
+ ### Regression gate with a committed baseline
16
+
17
+ Commit `.ax-baseline.json` to the repo and fail the build only when a check drops:
18
+
19
+ ```yaml
20
+ - name: AX Audit (regression gate)
21
+ run: npx ax-audit https://your-site.com --baseline .ax-baseline.json --fail-on-regression 5
22
+ ```
23
+
24
+ Refresh the baseline deliberately (e.g., after intentional changes):
25
+
26
+ ```bash
27
+ npx ax-audit https://your-site.com --save-baseline .ax-baseline.json
28
+ git add .ax-baseline.json && git commit -m "chore: refresh AX baseline"
29
+ ```
30
+
31
+ ### Markdown report as a PR comment
32
+
33
+ ```yaml
34
+ - name: AX Audit (markdown)
35
+ run: npx ax-audit ${{ env.PREVIEW_URL }} --output markdown > ax-report.md
36
+ continue-on-error: true
37
+
38
+ - name: Comment PR
39
+ uses: marocchino/sticky-pull-request-comment@v2
40
+ with:
41
+ path: ax-report.md
42
+ ```
43
+
44
+ This pairs naturally with Vercel/Netlify preview deployments: audit the preview URL on every PR and the reviewer sees the AX impact inline.
45
+
46
+ ### Artifacts
47
+
48
+ ```yaml
49
+ - name: AX Audit (JSON)
50
+ run: npx ax-audit https://your-site.com --json > ax-report.json
51
+
52
+ - uses: actions/upload-artifact@v4
53
+ with:
54
+ name: ax-audit-report
55
+ path: ax-report.json
56
+ ```
57
+
58
+ ## Auditing multiple environments
59
+
60
+ ```yaml
61
+ - name: AX Audit (all properties)
62
+ run: npx ax-audit https://www.your-site.com https://docs.your-site.com https://api.your-site.com --concurrency 3
63
+ # Exit 1 if any property scores < 70
64
+ ```
65
+
66
+ ## Tuning for CI stability
67
+
68
+ - `--retries 3` absorbs transient 5xx/timeouts from cold preview deployments (default is 2).
69
+ - `--timeout 15000` for slow staging environments.
70
+ - `--checks ...` to gate only on the surface you are iterating on — but remember the overall score then averages only the selected checks.
71
+
72
+ ## Scheduled audits
73
+
74
+ A weekly audit catches drift from infrastructure changes (CDN settings, WAF rules, header changes deployed by other teams):
75
+
76
+ ```yaml
77
+ on:
78
+ schedule:
79
+ - cron: '0 6 * * 1'
80
+
81
+ jobs:
82
+ ax-audit:
83
+ runs-on: ubuntu-latest
84
+ steps:
85
+ - uses: actions/checkout@v4
86
+ - run: npx ax-audit https://your-site.com --baseline .ax-baseline.json --fail-on-regression 0
87
+ ```
88
+
89
+ `--fail-on-regression 0` makes any per-check drop fail the workflow — appropriate for scheduled runs where every change is unexpected.
package/docs/cli.md ADDED
@@ -0,0 +1,67 @@
1
+ # CLI Reference
2
+
3
+ ```bash
4
+ ax-audit <urls...> [options]
5
+ ```
6
+
7
+ One or more fully qualified URLs (scheme required). A single URL produces a full report; multiple URLs run in batch mode with a summary table.
8
+
9
+ ## Options
10
+
11
+ | Flag | Default | Description |
12
+ | --- | --- | --- |
13
+ | `--output <format>` | `terminal` | Output format: `terminal`, `json`, `html`, `markdown`. Invalid values error out. |
14
+ | `--json` | — | Shorthand for `--output json`. |
15
+ | `--checks <list>` | all | Comma-separated check IDs to run (see [checks.md](./checks.md)). Unknown IDs error with the list of valid ones. |
16
+ | `--timeout <ms>` | `10000` | Per-request timeout in milliseconds. |
17
+ | `--retries <n>` | `2` | Retry attempts for transient fetch failures (network errors, timeouts, 408/425/429/5xx) with exponential backoff from 250ms. `0` disables retries. |
18
+ | `--concurrency <n>` | `1` | Batch mode only: maximum URLs audited in parallel. Output order always matches input order. |
19
+ | `--verbose` | — | Log every HTTP request, cache hit, retry, and per-check score to stderr. |
20
+ | `--only-failures` | — | Hide passing findings; checks with only passes are omitted entirely. |
21
+ | `--save-baseline <path>` | — | Save this audit as a baseline JSON file. |
22
+ | `--baseline <path>` | — | Compare against a saved baseline; shows per-check deltas (▲/▼). Single-URL mode only. |
23
+ | `--fail-on-regression <points>` | — | Exit 1 if any check regresses more than N points vs the baseline. Requires `--baseline`. |
24
+ | `-v, --version` | — | Print version. |
25
+
26
+ ## Output formats
27
+
28
+ - **terminal** — colored report with score bar, per-check sections, and PASS/WARN/FAIL findings.
29
+ - **json** — the full `AuditReport` (plus `baselineDiff` when `--baseline` is used). Stable shape for CI pipelines.
30
+ - **html** — self-contained page (score gauge, dark/light mode, collapsible sections). Pipe to a file: `ax-audit <url> --output html > report.html`.
31
+ - **markdown** — summary table + per-check findings with status emoji. Built for CI logs and PR comments: `ax-audit <url> --output markdown > report.md`.
32
+
33
+ ## Exit codes
34
+
35
+ | Code | Meaning |
36
+ | --- | --- |
37
+ | `0` | Score ≥ 70 (single), or all URLs ≥ 70 (batch), and no regression beyond the `--fail-on-regression` threshold. |
38
+ | `1` | Score < 70, any batch URL < 70, invalid arguments, or regression beyond threshold. |
39
+ | `2` | Fatal error (network failure on the audit itself, unreadable baseline file). |
40
+
41
+ ## Baseline workflow
42
+
43
+ ```bash
44
+ # First run — record the baseline
45
+ ax-audit https://your-site.com --save-baseline .ax-baseline.json
46
+
47
+ # Subsequent runs — compare and gate
48
+ ax-audit https://your-site.com --baseline .ax-baseline.json --fail-on-regression 5
49
+ ```
50
+
51
+ The baseline stores the overall score and per-check scores. Checks added after the baseline was saved appear as new (no delta); removed checks are ignored.
52
+
53
+ ## Examples
54
+
55
+ ```bash
56
+ # Quick audit
57
+ npx ax-audit https://your-site.com
58
+
59
+ # Only the AI-licensing surface
60
+ npx ax-audit https://your-site.com --checks robots-txt,rsl,content-negotiation
61
+
62
+ # Batch, 4 at a time, machine-readable
63
+ npx ax-audit $(cat urls.txt) --concurrency 4 --json > batch.json
64
+
65
+ # Show me only what is broken
66
+ npx ax-audit https://your-site.com --only-failures
67
+ ```
@@ -0,0 +1,87 @@
1
+ # Concepts: the AX standards landscape
2
+
3
+ "AI Agent Experience" (AX) is the sum of the conventions a site uses to be discovered, read, governed, and transacted with by autonomous AI agents and crawlers — the way "web accessibility" is the sum of conventions for assistive technology. This page maps the standards ax-audit checks against, why each exists, and how they relate. It's the conceptual companion to the mechanical detail in [checks.md](./checks.md).
4
+
5
+ ## Why AX is its own discipline
6
+
7
+ Agents are not browsers. Three differences drive every check:
8
+
9
+ 1. **They mostly don't run JavaScript.** GPTBot, ClaudeBot, CCBot and most crawlers fetch raw HTML. A client-rendered SPA that returns an empty `<div id="root">` is, to them, a blank page. (`html-rendering`, `content-negotiation`)
10
+ 2. **They look for declared structure, not visual layout.** An agent would rather read a `/llms.txt` summary or a JSON-LD graph than infer meaning from your CSS grid. (`llms-txt`, `structured-data`, `meta-tags`, `agent-json`, `mcp`, `openapi`)
11
+ 3. **Their access is a policy and economic question, not just a technical one.** Who may crawl, for what use, at what price, under what license — these now have machine-readable answers. (`robots-txt`, Content Signals, `rsl`, `agent-access`)
12
+
13
+ Bot traffic is projected to exceed human traffic by 2029. AX is the interface layer for that shift.
14
+
15
+ ## The four families of standards
16
+
17
+ ### 1. Content discovery & readability
18
+
19
+ | Standard | What it is | Check |
20
+ | --- | --- | --- |
21
+ | **[llms.txt](https://llmstxt.org)** | A Markdown file at your root summarizing your site for LLMs, with curated links. The "sitemap for AI." | `llms-txt` |
22
+ | **Server-side rendering** | Delivering real content in the HTML response, not assembling it client-side. | `html-rendering` |
23
+ | **[Markdown for Agents](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/)** | Content negotiation: serve clean Markdown when a client sends `Accept: text/markdown`. ~80% fewer tokens than HTML. | `content-negotiation` |
24
+ | **schema.org / JSON-LD** | Structured data describing entities (Person, Organization, Product) in a graph agents can parse. | `structured-data` |
25
+ | **Sitemaps** | The classic XML index, still how crawlers enumerate your URLs. | `sitemap` |
26
+
27
+ These answer: *can an agent find your content and actually read it?*
28
+
29
+ ### 2. Agent interaction surface
30
+
31
+ | Standard | What it is | Check |
32
+ | --- | --- | --- |
33
+ | **[A2A — Agent2Agent](https://a2a-protocol.org)** | An "Agent Card" at `/.well-known/agent.json` advertising your agent's identity and skills, so other agents can interoperate. | `agent-json` |
34
+ | **[MCP — Model Context Protocol](https://modelcontextprotocol.io)** | A manifest at `/.well-known/mcp.json` describing tools and resources an agent can call. The emerging standard for exposing capabilities to LLMs. | `mcp` |
35
+ | **[OpenAPI](https://www.openapis.org)** | The long-standing machine-readable API description; agents use it to call your endpoints. | `openapi` |
36
+ | **Emerging discovery files** | `ai.txt`, `genai.txt`, `ai-plugin.json`, `agents.json`, `nlweb.json` — competing/early conventions, scored as coverage bonus. | `well-known-ai` |
37
+ | **AI meta tags & discovery links** | `ai:*` meta tags and `rel="alternate"` links pointing agents to your llms.txt / agent.json. | `meta-tags` |
38
+
39
+ These answer: *once an agent arrives, can it understand what you offer and act on it?*
40
+
41
+ ### 3. Access governance & licensing
42
+
43
+ This is the newest and fastest-moving family — the response to "AI scraped my content and now competes with me."
44
+
45
+ | Standard | What it is | Check |
46
+ | --- | --- | --- |
47
+ | **[Robots Exclusion Protocol](https://www.rfc-editor.org/rfc/rfc9309.html)** | The original robots.txt — *who* may crawl *what*. ax-audit grades coverage of 48 known AI crawlers. | `robots-txt` |
48
+ | **[Content Signals](https://contentsignals.org)** | A robots.txt extension (Cloudflare, CC0) declaring *how* content may be used after access: `search`, `ai-input`, `ai-train`. Served by default on 3.8M+ Cloudflare domains. | `robots-txt` (findings) |
49
+ | **[RSL — Really Simple Licensing](https://rslstandard.org)** | A full machine-readable licensing layer (license.xml): permits/prohibits vocabularies, payment models (free, attribution, pay-per-crawl, pay-per-inference). Endorsed by 1,500+ publishers. | `rsl` |
50
+ | **Cloaking integrity** | Not a standard but a failure mode: your stated policy (robots.txt allows GPTBot) contradicting enforcement (WAF returns 403). | `agent-access` |
51
+
52
+ These answer: *have you expressed your access and usage policy in a form agents can honor — and does your infrastructure actually match it?*
53
+
54
+ The progression is one of increasing expressiveness: robots.txt says **who/where**, Content Signals adds **how it may be used**, RSL adds **under what license and price**.
55
+
56
+ ### 4. Transport, efficiency & hygiene
57
+
58
+ | Standard | What it is | Check |
59
+ | --- | --- | --- |
60
+ | **TLS / HSTS** | HTTPS everywhere; many agents refuse plaintext origins. | `tls-https` |
61
+ | **HTTP security & discovery headers** | Security headers plus `Link` headers advertising your AI files. | `http-headers` |
62
+ | **Compression & conditional GET** | Brotli/gzip and `ETag`/`304` — crawl cost matters when bots dominate traffic. | `crawl-efficiency` |
63
+ | **[RFC 9116 security.txt](https://www.rfc-editor.org/rfc/rfc9116)** | A machine-readable security contact. | `security-txt` |
64
+ | **SEO basics** | Title, description, canonical, lang, hreflang — agents use the same head-tag fundamentals search engines do. | `seo-basics` |
65
+
66
+ These answer: *is the connection trustworthy, cheap, and well-formed?*
67
+
68
+ ## On the horizon (not yet scored)
69
+
70
+ Two standards are maturing and worth watching:
71
+
72
+ - **[Web Bot Auth](https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-architecture/)** — cryptographic crawler verification via HTTP Message Signatures (RFC 9421). Bots sign requests with a key published at `/.well-known/http-message-signatures-directory`; sites verify identity instead of guessing from user-agent strings. Already implemented by Cloudflare and Google (`agent.bot.goog`). It directly affects the `agent-access` check: a WAF using Web Bot Auth may pass a real, signed crawler while rejecting ax-audit's unsigned probe — which is why that check's findings carry an explicit verified-bots caveat.
73
+ - **Pay-per-crawl / HTTP 402** — Cloudflare and the RSL payment vocabulary point toward metered, paid agent access. RSL already encodes the terms; enforcement protocols (Open License Protocol, x402) are emerging.
74
+
75
+ ## How the families compose
76
+
77
+ A fully AX-ready site tells a coherent story across all four:
78
+
79
+ > "Here's my content in a form you can read **(family 1)**, here's the interface to interact with me **(family 2)**, here's exactly who may use it and how, for what license **(family 3)**, over a fast and trustworthy connection **(family 4)**."
80
+
81
+ ax-audit's weighting reflects today's leverage: discovery and readability (`llms-txt`, `robots-txt`, `html-rendering`, `structured-data`, `http-headers`) carry the most weight because they're the highest-impact, most-adopted signals. The governance and efficiency standards are informational in 3.x — real and worth adopting, but still stabilizing — and gain weight in v4.0.
82
+
83
+ ## See also
84
+
85
+ - [getting-started.md](./getting-started.md) — run your first audit
86
+ - [checks.md](./checks.md) — exact scoring per standard
87
+ - The [remediation guides](https://lucioduran.com/projects/ax-audit/guides) — how to implement each one