ax-audit 3.0.0 → 3.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +76 -0
- package/README.md +61 -221
- package/dist/checks/agent-access.d.ts +16 -0
- package/dist/checks/agent-access.d.ts.map +1 -0
- package/dist/checks/agent-access.js +110 -0
- package/dist/checks/agent-access.js.map +1 -0
- package/dist/checks/content-negotiation.d.ts +4 -0
- package/dist/checks/content-negotiation.d.ts.map +1 -0
- package/dist/checks/content-negotiation.js +138 -0
- package/dist/checks/content-negotiation.js.map +1 -0
- package/dist/checks/crawl-efficiency.d.ts +4 -0
- package/dist/checks/crawl-efficiency.d.ts.map +1 -0
- package/dist/checks/crawl-efficiency.js +122 -0
- package/dist/checks/crawl-efficiency.js.map +1 -0
- package/dist/checks/index.d.ts.map +1 -1
- package/dist/checks/index.js +8 -0
- package/dist/checks/index.js.map +1 -1
- package/dist/checks/robots-txt.d.ts +20 -0
- package/dist/checks/robots-txt.d.ts.map +1 -1
- package/dist/checks/robots-txt.js +111 -3
- package/dist/checks/robots-txt.js.map +1 -1
- package/dist/checks/rsl.d.ts +6 -0
- package/dist/checks/rsl.d.ts.map +1 -0
- package/dist/checks/rsl.js +252 -0
- package/dist/checks/rsl.js.map +1 -0
- package/dist/cli.d.ts.map +1 -1
- package/dist/cli.js +20 -2
- package/dist/cli.js.map +1 -1
- package/dist/constants.d.ts +17 -0
- package/dist/constants.d.ts.map +1 -1
- package/dist/constants.js +42 -1
- package/dist/constants.js.map +1 -1
- package/dist/fetcher.d.ts +7 -3
- package/dist/fetcher.d.ts.map +1 -1
- package/dist/fetcher.js +68 -30
- package/dist/fetcher.js.map +1 -1
- package/dist/index.d.ts +2 -1
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +1 -0
- package/dist/index.js.map +1 -1
- package/dist/orchestrator.d.ts +2 -2
- package/dist/orchestrator.d.ts.map +1 -1
- package/dist/orchestrator.js +13 -6
- package/dist/orchestrator.js.map +1 -1
- package/dist/reporter/index.d.ts.map +1 -1
- package/dist/reporter/index.js +7 -0
- package/dist/reporter/index.js.map +1 -1
- package/dist/reporter/markdown.d.ts +8 -0
- package/dist/reporter/markdown.d.ts.map +1 -0
- package/dist/reporter/markdown.js +76 -0
- package/dist/reporter/markdown.js.map +1 -0
- package/dist/scorer.d.ts.map +1 -1
- package/dist/scorer.js +8 -0
- package/dist/scorer.js.map +1 -1
- package/dist/types.d.ts +17 -2
- package/dist/types.d.ts.map +1 -1
- package/docs/api.md +200 -0
- package/docs/architecture.md +88 -0
- package/docs/checks.md +322 -0
- package/docs/ci.md +89 -0
- package/docs/cli.md +67 -0
- package/docs/concepts.md +87 -0
- package/docs/faq.md +77 -0
- package/docs/getting-started.md +101 -0
- package/package.json +2 -1
package/docs/faq.md
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# FAQ & Troubleshooting
|
|
2
|
+
|
|
3
|
+
## Scores & results
|
|
4
|
+
|
|
5
|
+
### Why did my score change after upgrading ax-audit?
|
|
6
|
+
|
|
7
|
+
In 3.x, score changes on the same site are treated as **breaking** and only happen in major or minor releases that explicitly say so. The 3.0.0 release redistributed weights across 14 checks and added Content-Type penalties — see its CHANGELOG entry. Every check added since (3.1.0–3.6.0) ships at **weight 0** precisely so your score and baselines don't move. To track changes deliberately, use `--baseline` (see [cli.md](./cli.md)).
|
|
8
|
+
|
|
9
|
+
### Why is my score lower than my Lighthouse / SEO score?
|
|
10
|
+
|
|
11
|
+
ax-audit measures the *AI-agent* surface, not performance, accessibility, or human SEO. A fast, beautiful site can still score poorly if it has no `llms.txt`, ships an empty SPA shell to non-JS crawlers, and exposes no structured data. That gap is the reason the tool exists.
|
|
12
|
+
|
|
13
|
+
### A check shows 0 but the file exists — why?
|
|
14
|
+
|
|
15
|
+
Most "not found" hard-fails mean the request didn't return a 2xx. Common causes: the file is served with a redirect chain that breaks, a non-2xx status, or — most often — a WAF/bot-rule blocking ax-audit's request (see below). Re-run with `--verbose` to see the exact status per request.
|
|
16
|
+
|
|
17
|
+
### What's the difference between a weighted and an informational check?
|
|
18
|
+
|
|
19
|
+
Weighted checks (14) sum to 100% and determine your overall score. Informational checks (4: `content-negotiation`, `rsl`, `agent-access`, `crawl-efficiency`) run and report full findings but contribute 0 to the score in 3.x. They gain weight in v4.0. The Content Signals findings inside `robots-txt` are likewise informational.
|
|
20
|
+
|
|
21
|
+
## False positives & caveats
|
|
22
|
+
|
|
23
|
+
### `agent-access` flags crawlers as blocked, but my real crawlers work fine
|
|
24
|
+
|
|
25
|
+
This is the most important caveat in the tool. ax-audit's probe sends a user-agent *containing* the crawler token (e.g. `...GPTBot/1.0`) but it is **not** the real, verified crawler. If your WAF verifies bots cryptographically ([Web Bot Auth](./concepts.md)) or by IP range, it will correctly pass the genuine GPTBot while rejecting ax-audit's unverified probe. **Before changing any WAF rule, confirm against your WAF logs** whether real crawler traffic is actually being served. If it is, this finding is a false positive for your setup.
|
|
26
|
+
|
|
27
|
+
### `well-known-ai` is low — should I worry?
|
|
28
|
+
|
|
29
|
+
No. It's scored as *coverage bonus* over five emerging, partly-competing files (`ai.txt`, `genai.txt`, `ai-plugin.json`, `agents.json`, `nlweb.json`). None is universally adopted; a low score here is not a defect. Implement the ones relevant to your stack.
|
|
30
|
+
|
|
31
|
+
### `crawl-efficiency` says no compression, but my CDN compresses
|
|
32
|
+
|
|
33
|
+
The check reads the `Content-Encoding` header on the response it received. If a proxy between ax-audit and your origin strips or fails to negotiate compression, you'll see this. Verify directly: `curl -sI -H 'Accept-Encoding: br, gzip' https://your-site.com | grep -i content-encoding`.
|
|
34
|
+
|
|
35
|
+
### `content-negotiation` fails but I don't serve Markdown
|
|
36
|
+
|
|
37
|
+
That's expected — most sites don't yet. It's informational (weight 0). Adopt it when you're ready; the [guide](https://lucioduran.com/projects/ax-audit/guides/content-negotiation) covers Cloudflare/Vercel zero-code options.
|
|
38
|
+
|
|
39
|
+
## Running the tool
|
|
40
|
+
|
|
41
|
+
### My WAF is blocking ax-audit itself
|
|
42
|
+
|
|
43
|
+
ax-audit sends a `User-Agent` of `ax-audit/<version> (+https://github.com/lucioduran/ax-audit)`. If your firewall challenges unknown agents, allowlist that UA (or the IP you run from) for the duration of the audit. Note that several checks deliberately send *other* user-agents (`agent-access`) and unusual `Accept` headers (`content-negotiation`) — a WAF rejecting those is itself a finding, not a tool bug.
|
|
44
|
+
|
|
45
|
+
### How do I audit a staging site behind auth?
|
|
46
|
+
|
|
47
|
+
ax-audit has no auth support today. Options: run it from inside the network perimeter, temporarily allowlist its UA/IP, or audit a public preview deployment (the typical CI pattern — see [ci.md](./ci.md)).
|
|
48
|
+
|
|
49
|
+
### Audits are slow / flaky on cold deployments
|
|
50
|
+
|
|
51
|
+
Transient failures (timeouts, 5xx) retry automatically with backoff — raise `--retries` (default 2) for very cold preview environments and `--timeout` (default 10000ms) for slow origins. In batch mode, `--concurrency` speeds up multi-URL runs.
|
|
52
|
+
|
|
53
|
+
### Can I run only some checks?
|
|
54
|
+
|
|
55
|
+
Yes: `--checks llms-txt,robots-txt,rsl`. Note the overall score then averages *only* those checks, so a subset run isn't comparable to a full-audit score. Unknown IDs error out with the valid list.
|
|
56
|
+
|
|
57
|
+
### Is there rate limiting I should know about?
|
|
58
|
+
|
|
59
|
+
The tool itself doesn't rate-limit, but it makes several requests per audit (one per check, plus follow-ups for conditional GET, content negotiation, and the 8 `agent-access` probes). All responses are cached per run, so repeated checks of the same URL don't re-fetch. Be considerate auditing sites you don't own.
|
|
60
|
+
|
|
61
|
+
## Integration
|
|
62
|
+
|
|
63
|
+
### Does it work in CI?
|
|
64
|
+
|
|
65
|
+
Yes — exit codes gate the build (`0` = Good/Excellent, `1` = Fair/Poor). See [ci.md](./ci.md) for GitHub Actions recipes including PR comments via `--output markdown` and regression gates via `--baseline`.
|
|
66
|
+
|
|
67
|
+
### Can I consume results programmatically?
|
|
68
|
+
|
|
69
|
+
Yes — `import { audit } from 'ax-audit'` returns a typed `AuditReport`. See [api.md](./api.md).
|
|
70
|
+
|
|
71
|
+
### How do I generate the files ax-audit checks for?
|
|
72
|
+
|
|
73
|
+
Use [ax-init](https://github.com/lucioduran/ax-init) — it generates `llms.txt`, `robots.txt`, `agent.json`, `mcp.json`, `security.txt`, structured data, and header snippets, then you verify with `npx ax-audit`.
|
|
74
|
+
|
|
75
|
+
## Still stuck?
|
|
76
|
+
|
|
77
|
+
Open an issue at [github.com/lucioduran/ax-audit/issues](https://github.com/lucioduran/ax-audit/issues) with the output of `npx ax-audit <url> --verbose`.
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Getting Started
|
|
2
|
+
|
|
3
|
+
This walkthrough takes you from zero to a passing AX score: run your first audit, learn to read the report, and fix findings in the order that moves your score most.
|
|
4
|
+
|
|
5
|
+
## 1. Run your first audit
|
|
6
|
+
|
|
7
|
+
No install needed:
|
|
8
|
+
|
|
9
|
+
```bash
|
|
10
|
+
npx ax-audit https://your-site.com
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
You get a report like:
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
AX Audit Report
|
|
17
|
+
https://your-site.com
|
|
18
|
+
|
|
19
|
+
██████████████████████░░░░░░░░░░░░░░░░░░ 56/100 Fair
|
|
20
|
+
|
|
21
|
+
LLMs.txt (0/100)
|
|
22
|
+
FAIL /llms.txt not found
|
|
23
|
+
...
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Three things to locate immediately:
|
|
27
|
+
|
|
28
|
+
- **The overall score and grade.** 0–100, weighted across 14 checks. Grades: Excellent (≥90), Good (≥70), Fair (≥50), Poor (<50). The CLI exits `0` at Good or better — that is the CI gate.
|
|
29
|
+
- **Per-check scores.** Each check is independent and scored 0–100. The weight of each check is in [checks.md](./checks.md).
|
|
30
|
+
- **Findings.** Every `WARN`/`FAIL` line carries a hint and a `learnMoreUrl` to a remediation guide with copy-pasteable fixes.
|
|
31
|
+
|
|
32
|
+
To see only what needs fixing:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
npx ax-audit https://your-site.com --only-failures
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## 2. Understand what you're optimizing
|
|
39
|
+
|
|
40
|
+
AI agents interact with your site differently than browsers: most don't execute JavaScript, they look for machine-readable discovery files, and they respect (or at least read) your declared crawler policy. The audit measures three layers — if you're new to the standards involved (llms.txt, A2A, MCP, RSL, Content Signals), read [concepts.md](./concepts.md) first:
|
|
41
|
+
|
|
42
|
+
1. **Can agents find and read your content?** (`html-rendering`, `robots-txt`, `sitemap`, `tls-https`, `agent-access`)
|
|
43
|
+
2. **Did you publish the AI-specific surface?** (`llms-txt`, `agent-json`, `mcp`, `openapi`, `well-known-ai`, `meta-tags`, `structured-data`)
|
|
44
|
+
3. **Is the interaction efficient and well-governed?** (`content-negotiation`, `crawl-efficiency`, `rsl`, Content Signals, `http-headers`, `security-txt`, `seo-basics`)
|
|
45
|
+
|
|
46
|
+
## 3. Fix in impact order
|
|
47
|
+
|
|
48
|
+
The fastest path from Fair to Good, by weight and typical effort:
|
|
49
|
+
|
|
50
|
+
| Step | Check | Weight | Typical effort |
|
|
51
|
+
| --- | --- | --- | --- |
|
|
52
|
+
| 1 | Create `/llms.txt` | 11% | 30 minutes — it's a Markdown file. `npx ax-init` generates it. |
|
|
53
|
+
| 2 | Configure `robots.txt` for the 8 core AI crawlers | 11% | 15 minutes; `npx ax-init` generates this too |
|
|
54
|
+
| 3 | Verify server-rendered content | 9% | Free if you SSR; significant if you ship an SPA shell |
|
|
55
|
+
| 4 | Add JSON-LD structured data | 9% | 1–2 hours |
|
|
56
|
+
| 5 | Security + discovery headers | 9% | 30 minutes of server config |
|
|
57
|
+
| 6 | `agent.json` + `mcp.json` | 14% combined | An hour with the spec links in the guides |
|
|
58
|
+
|
|
59
|
+
The remaining weighted checks (`seo-basics`, `security-txt`, `meta-tags`, `openapi`, `tls-https`, `sitemap`, `well-known-ai`) are mostly configuration; the remediation guides give exact snippets for Nginx, Vercel, Netlify, and Express.
|
|
60
|
+
|
|
61
|
+
Re-run after each fix — all requests are cached per run, so audits are fast and cheap.
|
|
62
|
+
|
|
63
|
+
## 4. Lock in your progress with a baseline
|
|
64
|
+
|
|
65
|
+
Once you reach a score you're happy with, freeze it:
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
npx ax-audit https://your-site.com --save-baseline .ax-baseline.json
|
|
69
|
+
git add .ax-baseline.json && git commit -m "chore: AX baseline"
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
From then on, compare every run against it:
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
npx ax-audit https://your-site.com --baseline .ax-baseline.json --fail-on-regression 5
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
This catches drift you didn't cause — a CDN toggle, a WAF rule, a header dropped in a refactor. Wire it into CI with the recipes in [ci.md](./ci.md).
|
|
79
|
+
|
|
80
|
+
## 5. Look at the informational checks
|
|
81
|
+
|
|
82
|
+
Four checks report findings without affecting your score yet (they will in v4.0): `content-negotiation`, `rsl`, `agent-access`, `crawl-efficiency`. Treat them as the early-warning lane — they cover the newest standards, and fixing them now means v4.0 changes nothing for you.
|
|
83
|
+
|
|
84
|
+
The one to check first is `agent-access`: it detects the failure mode you cannot see — your robots.txt allows GPTBot while your WAF returns it a 403:
|
|
85
|
+
|
|
86
|
+
```bash
|
|
87
|
+
npx ax-audit https://your-site.com --checks agent-access
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Common first-run questions
|
|
91
|
+
|
|
92
|
+
- **"My score seems harsh."** The audit measures the AI-agent surface, not site quality. A beautiful SPA with no llms.txt, no structured data, and an empty `#root` div is genuinely poor AX — that's the point of the tool.
|
|
93
|
+
- **"A check crashed / network error."** Transient failures retry automatically (`--retries`, default 2). For slow staging environments raise `--timeout`.
|
|
94
|
+
- **"Which findings are safe to ignore?"** See the [FAQ](./faq.md) — notably the `agent-access` verified-bots caveat and `well-known-ai`, which is coverage bonus rather than baseline.
|
|
95
|
+
|
|
96
|
+
## Next steps
|
|
97
|
+
|
|
98
|
+
- [checks.md](./checks.md) — exact scoring of all 18 checks
|
|
99
|
+
- [concepts.md](./concepts.md) — the AX standards landscape explained
|
|
100
|
+
- [cli.md](./cli.md) — every flag · [ci.md](./ci.md) — CI recipes · [api.md](./api.md) — programmatic use
|
|
101
|
+
- [ax-init](https://github.com/lucioduran/ax-init) — generates most of the files this tool audits
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "ax-audit",
|
|
3
|
-
"version": "3.
|
|
3
|
+
"version": "3.6.0",
|
|
4
4
|
"description": "Audit websites for AI Agent Experience (AX) readiness. Lighthouse for AI Agents.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"license": "Apache-2.0",
|
|
@@ -40,6 +40,7 @@
|
|
|
40
40
|
"files": [
|
|
41
41
|
"bin/",
|
|
42
42
|
"dist/",
|
|
43
|
+
"docs/",
|
|
43
44
|
"LICENSE",
|
|
44
45
|
"README.md",
|
|
45
46
|
"CHANGELOG.md"
|