@diegovelasquezweb/a11y-engine 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -9,6 +9,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
9
9
 
10
10
  ---
11
11
 
12
+ ## [0.1.3] — 2026-03-14
13
+
14
+ ### Added
15
+
16
+ - **Multi-engine scanning**: three independent engines now run against each page:
17
+ - **axe-core** (via `@axe-core/playwright`) — primary WCAG rule engine injected into the live page
18
+ - **CDP** (Chrome DevTools Protocol) — queries the browser's accessibility tree for missing accessible names and aria-hidden on focusable elements
19
+ - **pa11y** (HTML CodeSniffer via Puppeteer) — catches heading hierarchy, link purpose, and form association issues
20
+ - Cross-engine merge and deduplication in `mergeViolations()` — removes duplicate findings across axe, CDP, and pa11y based on rule equivalence and selector matching
21
+ - Real-time `progress.json` with per-engine step tracking and finding counts (`found` for each engine, `merged` total after dedup)
22
+ - `--axe-tags` CLI flag for filtering axe-core WCAG tag sets (also determines pa11y standard)
23
+ - Non-visible element skip list for screenshots (`<meta>`, `<link>`, `<style>`, `<script>`, `<title>`, `<base>`) — prevents timeout warnings on elements that cannot be scrolled into view
24
+
25
+ ### Changed
26
+
27
+ - `a11y-scan-results.json` now contains merged violations from all three engines (previously axe-core only)
28
+ - Each violation includes a `source` field (`"cdp"` or `"pa11y"`) to identify which engine produced it (axe-core violations have no `source` field for backwards compatibility)
29
+ - README rewritten to reflect multi-engine architecture
30
+ - All documentation (`architecture.md`, `cli-handbook.md`, `outputs.md`) updated to describe the three-engine pipeline, merge/dedup logic, progress tracking, and dual browser requirements
31
+
32
+ ### Fixed
33
+
34
+ - Screenshot capture no longer attempts to scroll non-visible `<head>` elements into view
35
+
36
+ ---
37
+
12
38
  ## [0.1.2] — 2026-03-13
13
39
 
14
40
  ### Fixed
package/README.md CHANGED
@@ -1,39 +1,77 @@
1
1
  # @diegovelasquezweb/a11y-engine
2
2
 
3
- WCAG 2.2 AA accessibility audit engine. Runs Playwright + axe-core scans, enriches findings with fix intelligence, and produces structured artifacts for developers, agents, and stakeholders.
3
+ Multi-engine WCAG 2.2 AA accessibility audit engine. Combines three scanning engines (axe-core, Chrome DevTools Protocol, and pa11y), merges and deduplicates their findings, enriches results with fix intelligence, and produces structured artifacts for developers, agents, and stakeholders.
4
4
 
5
5
  ## What it is
6
6
 
7
7
  A Node.js CLI and programmatic engine that:
8
8
 
9
9
  1. Crawls a target URL and discovers routes automatically
10
- 2. Runs axe-core WCAG 2.2 AA checks across all discovered pages
11
- 3. Optionally scans project source code for patterns axe cannot detect at runtime
12
- 4. Enriches each finding with stack-aware fix guidance, selectors, and verification commands
13
- 5. Produces a full artifact set: JSON data, Markdown remediation guide, HTML dashboard, PDF compliance report, and manual testing checklist
10
+ 2. Runs three independent accessibility engines against each page:
11
+ - **axe-core** industry-standard WCAG rule engine, injected into the live page via Playwright
12
+ - **CDP** (Chrome DevTools Protocol) queries the browser's accessibility tree directly for issues axe may miss (missing accessible names, aria-hidden on focusable elements)
13
+ - **pa11y** (HTML CodeSniffer) catches WCAG violations around heading hierarchy, link purpose, and form associations
14
+ 3. Merges and deduplicates findings across all three engines
15
+ 4. Optionally scans project source code for patterns no runtime engine can detect
16
+ 5. Enriches each finding with stack-aware fix guidance, selectors, and verification commands
17
+ 6. Produces a full artifact set: JSON data, Markdown remediation guide, HTML dashboard, PDF compliance report, and manual testing checklist
14
18
 
15
19
  ## Why use this engine
16
20
 
17
21
  | Capability | With this engine | Without |
18
22
  | :--- | :--- | :--- |
19
- | **Full WCAG 2.2 Coverage** | axe-core runtime scan + source code pattern scanner | Runtime scan only misses CSS/source-level issues |
23
+ | **Multi-engine scanning** | axe-core + CDP accessibility tree + pa11y (HTML CodeSniffer) with cross-engine deduplication | Single enginehigher false-negative rate |
24
+ | **Full WCAG 2.2 Coverage** | Three runtime engines + source code pattern scanner | Runtime scan only — misses structural and source-level issues |
20
25
  | **Fix Intelligence** | Stack-aware patches with code snippets tailored to detected framework | Raw rule violations with no remediation context |
21
26
  | **Structured Artifacts** | JSON + Markdown + HTML + PDF + Checklist — ready to consume or forward | Findings exist only in the terminal session |
22
27
  | **CI/Agent Integration** | Deterministic exit codes, stdout-parseable output paths, JSON schema | Requires wrapper scripting |
23
28
 
29
+ ## How the scan pipeline works
30
+
31
+ ```
32
+ URL
33
+ |
34
+ v
35
+ [1. Crawl & Discover] sitemap.xml / BFS link crawl / explicit --routes
36
+ |
37
+ v
38
+ [2. Navigate] Playwright opens each route in Chromium
39
+ |
40
+ +---> [axe-core] Injects axe into the page, runs WCAG tag checks
41
+ |
42
+ +---> [CDP] Opens a CDP session, reads the full accessibility tree
43
+ |
44
+ +---> [pa11y] Launches HTML CodeSniffer via Puppeteer Chrome
45
+ |
46
+ v
47
+ [3. Merge & Dedup] Combines findings, removes cross-engine duplicates
48
+ |
49
+ v
50
+ [4. Analyze] Enriches with WCAG mapping, severity, fix code, framework hints
51
+ |
52
+ v
53
+ [5. Reports] HTML dashboard, PDF, checklist, Markdown remediation
54
+ ```
55
+
24
56
  ## Installation
25
57
 
26
58
  ```bash
27
59
  npm install @diegovelasquezweb/a11y-engine
28
60
  npx playwright install chromium
61
+ npx puppeteer browsers install chrome
29
62
  ```
30
63
 
31
64
  ```bash
32
65
  pnpm add @diegovelasquezweb/a11y-engine
33
66
  pnpm exec playwright install chromium
67
+ npx puppeteer browsers install chrome
34
68
  ```
35
69
 
36
- > Chromium must be installed separately. The engine uses Playwright's bundled browser — not a system Chrome.
70
+ > **Two browsers are required:**
71
+ > - **Playwright Chromium** — used by axe-core and CDP checks
72
+ > - **Puppeteer Chrome** — used by pa11y (HTML CodeSniffer)
73
+ >
74
+ > These are separate browser installations. If Puppeteer Chrome is missing, pa11y checks fail silently (non-fatal) and the scan continues with axe + CDP only.
37
75
 
38
76
  ## Quick start
39
77
 
@@ -60,7 +98,7 @@ a11y-audit --base-url <url> [options]
60
98
  | :--- | :--- | :--- | :--- |
61
99
  | `--base-url` | `<url>` | (Required) | Starting URL for the audit. |
62
100
  | `--max-routes` | `<num>` | `10` | Max routes to discover and scan. |
63
- | `--crawl-depth` | `<num>` | `2` | BFS link-follow depth during discovery (13). |
101
+ | `--crawl-depth` | `<num>` | `2` | BFS link-follow depth during discovery (1-3). |
64
102
  | `--routes` | `<csv>` | — | Explicit path list, bypasses auto-discovery. |
65
103
  | `--project-dir` | `<path>` | — | Path to project source. Enables source pattern scanner and framework auto-detection. |
66
104
 
@@ -72,6 +110,7 @@ a11y-audit --base-url <url> [options]
72
110
  | `--only-rule` | `<id>` | — | Run a single axe rule (e.g. `color-contrast`). |
73
111
  | `--ignore-findings` | `<csv>` | — | Rule IDs to exclude from output. |
74
112
  | `--exclude-selectors` | `<csv>` | — | CSS selectors to skip during DOM scan. |
113
+ | `--axe-tags` | `<csv>` | `wcag2a,wcag2aa,wcag21a,wcag21aa,wcag22a,wcag22aa` | axe-core WCAG tag filter. |
75
114
  | `--framework` | `<name>` | — | Override auto-detected stack. Supported: `nextjs`, `gatsby`, `react`, `nuxt`, `vue`, `angular`, `astro`, `svelte`, `shopify`, `wordpress`, `drupal`. |
76
115
 
77
116
  ### Execution & emulation
@@ -123,8 +162,9 @@ All artifacts are written to `.audit/` relative to the package root.
123
162
 
124
163
  | File | Always generated | Description |
125
164
  | :--- | :--- | :--- |
126
- | `a11y-scan-results.json` | Yes | Raw axe-core results per route |
127
- | `a11y-findings.json` | Yes | Enriched findings with fix intelligence |
165
+ | `a11y-scan-results.json` | Yes | Raw merged results from axe-core + CDP + pa11y per route |
166
+ | `a11y-findings.json` | Yes | Enriched findings with fix intelligence, WCAG mapping, and severity |
167
+ | `progress.json` | Yes | Real-time scan progress with per-engine step status and finding counts |
128
168
  | `remediation.md` | Yes | AI-agent-optimized remediation roadmap |
129
169
  | `report.html` | With `--with-reports` | Interactive HTML dashboard |
130
170
  | `report.pdf` | With `--with-reports` | Formal compliance PDF |
@@ -132,16 +172,47 @@ All artifacts are written to `.audit/` relative to the package root.
132
172
 
133
173
  See [Output Artifacts](docs/outputs.md) for full schema reference.
134
174
 
175
+ ## Scan engines
176
+
177
+ ### axe-core (via @axe-core/playwright)
178
+
179
+ The primary engine. Runs Deque's axe-core rule set against the live DOM inside Playwright's Chromium. Covers the majority of automatable WCAG 2.2 AA success criteria.
180
+
181
+ ### CDP (Chrome DevTools Protocol)
182
+
183
+ Queries the browser's full accessibility tree via a CDP session. Catches issues axe may miss:
184
+ - Interactive elements (buttons, links, inputs) with no accessible name
185
+ - Focusable elements hidden with `aria-hidden`
186
+
187
+ ### pa11y (HTML CodeSniffer)
188
+
189
+ Runs Squiz's HTML CodeSniffer via Puppeteer Chrome. Catches WCAG violations around:
190
+ - Heading hierarchy
191
+ - Link purpose
192
+ - Form label associations
193
+
194
+ Requires a separate Chrome installation (`npx puppeteer browsers install chrome`). If Chrome is missing, pa11y fails silently and the scan continues with axe + CDP.
195
+
196
+ ### Merge & deduplication
197
+
198
+ After all three engines run, findings are merged and deduplicated:
199
+ - axe findings are added first (baseline)
200
+ - CDP findings are checked against axe equivalents (e.g. `cdp-missing-accessible-name` vs `button-name`) to avoid duplicates
201
+ - pa11y findings are checked against existing selectors to avoid triple-reporting the same element
202
+
135
203
  ## Troubleshooting
136
204
 
137
205
  **`Error: browserType.launch: Executable doesn't exist`**
138
206
  Run `npx playwright install chromium` (or `pnpm exec playwright install chromium`).
139
207
 
208
+ **`pa11y checks failed (non-fatal): Could not find Chrome`**
209
+ pa11y requires Puppeteer's Chrome, which is separate from Playwright's Chromium. Install it with `npx puppeteer browsers install chrome`.
210
+
140
211
  **`Missing required argument: --base-url`**
141
212
  The flag is required. Provide a full URL including protocol: `--base-url https://example.com`.
142
213
 
143
214
  **Scan returns 0 findings on an SPA**
144
- Use `--wait-until networkidle --wait-ms 3000` to let async content render before axe runs.
215
+ Use `--wait-until networkidle --wait-ms 3000` to let async content render before the engines run.
145
216
 
146
217
  **`--with-reports` exits without generating PDF**
147
218
  Ensure `--output` is also set and points to an `.html` file path: `--output ./audit/report.html`.
@@ -153,7 +224,7 @@ Add `--no-sandbox` via the `PLAYWRIGHT_CHROMIUM_LAUNCH_OPTIONS` env var, or run
153
224
 
154
225
  | Resource | Description |
155
226
  | :--- | :--- |
156
- | [Architecture](https://github.com/diegovelasquezweb/a11y-engine/blob/main/docs/architecture.md) | How the scanner → analyzer → report pipeline works |
227
+ | [Architecture](https://github.com/diegovelasquezweb/a11y-engine/blob/main/docs/architecture.md) | How the multi-engine scanner pipeline works |
157
228
  | [CLI Handbook](https://github.com/diegovelasquezweb/a11y-engine/blob/main/docs/cli-handbook.md) | Full flag reference and usage patterns |
158
229
  | [Output Artifacts](https://github.com/diegovelasquezweb/a11y-engine/blob/main/docs/outputs.md) | Schema and structure of every generated file |
159
230
 
@@ -0,0 +1,30 @@
1
+ {
2
+ "interactiveRoles": [
3
+ "button", "link", "textbox", "combobox", "listbox",
4
+ "menuitem", "tab", "checkbox", "radio", "switch", "slider"
5
+ ],
6
+ "rules": [
7
+ {
8
+ "id": "cdp-missing-accessible-name",
9
+ "condition": "interactive-no-name",
10
+ "impact": "serious",
11
+ "tags": ["wcag2a", "wcag412", "cdp-check"],
12
+ "help": "Interactive elements must have an accessible name",
13
+ "helpUrl": "https://dequeuniversity.com/rules/axe/4.11/button-name",
14
+ "description": "Interactive element with role \"{{role}}\" has no accessible name",
15
+ "failureMessage": "Element with role \"{{role}}\" has no accessible name in the accessibility tree",
16
+ "axeEquivalents": ["button-name", "link-name", "input-name", "aria-command-name"]
17
+ },
18
+ {
19
+ "id": "cdp-aria-hidden-focusable",
20
+ "condition": "hidden-focusable",
21
+ "impact": "serious",
22
+ "tags": ["wcag2a", "wcag412", "cdp-check"],
23
+ "help": "aria-hidden elements must not be focusable",
24
+ "helpUrl": "https://dequeuniversity.com/rules/axe/4.11/aria-hidden-focus",
25
+ "description": "Focusable element with role \"{{role}}\" is aria-hidden",
26
+ "failureMessage": "Focusable element with role \"{{role}}\" is hidden from the accessibility tree",
27
+ "axeEquivalents": ["aria-hidden-focus"]
28
+ }
29
+ ]
30
+ }
@@ -0,0 +1,53 @@
1
+ {
2
+ "ignoreByPrinciple": [
3
+ "Principle1.Guideline1_4.1_4_3.G18.Fail",
4
+ "Principle4.Guideline4_1.4_1_2.H91.A.NoContent"
5
+ ],
6
+ "impactMap": {
7
+ "1": "serious",
8
+ "2": "moderate",
9
+ "3": "minor"
10
+ },
11
+ "equivalenceMap": {
12
+ "Principle1.Guideline1_4.1_4_3.G145": "color-contrast",
13
+ "Principle1.Guideline1_4.1_4_3.G18": "color-contrast",
14
+ "Principle1.Guideline1_4.1_4_3.G145.Fail": "color-contrast",
15
+ "Principle1.Guideline1_4.1_4_3.G18.Fail": "color-contrast",
16
+ "Principle1.Guideline1_3.1_3_1.H42": "heading-order",
17
+ "Principle1.Guideline1_3.1_3_1.H42.2": "empty-heading",
18
+ "Principle1.Guideline1_3.1_3_1.H44": "label",
19
+ "Principle1.Guideline1_3.1_3_1.H65": "label",
20
+ "Principle1.Guideline1_3.1_3_1.H71": "label",
21
+ "Principle1.Guideline1_3.1_3_1.H85": "listitem",
22
+ "Principle1.Guideline1_3.1_3_1.H48": "list",
23
+ "Principle1.Guideline1_3.1_3_1.H39": "table-fake-caption",
24
+ "Principle1.Guideline1_3.1_3_1.H73": "table-fake-caption",
25
+ "Principle1.Guideline1_1.1_1_1.H37": "image-alt",
26
+ "Principle1.Guideline1_1.1_1_1.H67": "image-alt",
27
+ "Principle1.Guideline1_1.1_1_1.H36": "input-image-alt",
28
+ "Principle1.Guideline1_1.1_1_1.H2": "image-redundant-alt",
29
+ "Principle1.Guideline1_1.1_1_1.H53": "object-alt",
30
+ "Principle1.Guideline1_1.1_1_1.G94": "image-alt",
31
+ "Principle1.Guideline1_1.1_1_1.H24": "area-alt",
32
+ "Principle2.Guideline2_4.2_4_1.H64": "frame-title",
33
+ "Principle2.Guideline2_4.2_4_1.G1": "bypass",
34
+ "Principle2.Guideline2_4.2_4_1.G124": "bypass",
35
+ "Principle2.Guideline2_4.2_4_2.H25": "document-title",
36
+ "Principle2.Guideline2_4.2_4_4.H77": "link-name",
37
+ "Principle1.Guideline1_1.1_1_1.H30": "link-name",
38
+ "Principle2.Guideline2_4.2_4_6.G197": "label",
39
+ "Principle2.Guideline2_1.2_1_1.G202": "scrollable-region-focusable",
40
+ "Principle3.Guideline3_1.3_1_1.H57": "html-has-lang",
41
+ "Principle3.Guideline3_1.3_1_1.H57.2": "html-has-lang",
42
+ "Principle3.Guideline3_1.3_1_1.H57.3": "html-lang-valid",
43
+ "Principle3.Guideline3_1.3_1_1.H57.3.Lang": "html-lang-valid",
44
+ "Principle3.Guideline3_2.3_2_1.G107": "select-name",
45
+ "Principle3.Guideline3_3.3_3_2.G131": "label",
46
+ "Principle4.Guideline4_1.4_1_1.F77": "duplicate-id",
47
+ "Principle4.Guideline4_1.4_1_2.H91": "button-name",
48
+ "Principle4.Guideline4_1.4_1_2.H91.A": "link-name",
49
+ "Principle4.Guideline4_1.4_1_2.H91.Button": "button-name",
50
+ "Principle4.Guideline4_1.4_1_2.H91.InputText": "label",
51
+ "Principle4.Guideline4_1.4_1_2.H91.Select": "select-name"
52
+ }
53
+ }
@@ -8,6 +8,11 @@
8
8
 
9
9
  - [Pipeline overview](#pipeline-overview)
10
10
  - [Stage 1: DOM scanner](#stage-1-dom-scanner)
11
+ - [axe-core](#axe-core)
12
+ - [CDP checks](#cdp-checks)
13
+ - [pa11y](#pa11y)
14
+ - [Merge and deduplication](#merge-and-deduplication)
15
+ - [Stage 1b: Source scanner](#optional-source-scanner)
11
16
  - [Stage 2: Analyzer](#stage-2-analyzer)
12
17
  - [Stage 3: Report builders](#stage-3-report-builders)
13
18
  - [Assets and rule intelligence](#assets-and-rule-intelligence)
@@ -23,59 +28,133 @@ The engine operates as a three-stage pipeline. Each stage is an independent Node
23
28
  Target URL
24
29
 
25
30
 
26
- ┌─────────────────────────────┐
27
- │ Stage 1: DOM Scanner Playwright + axe-core
28
- │ dom-scanner.mjs Route discovery + WCAG scan
29
- └──────────────┬──────────────┘
30
- a11y-scan-results.json
31
-
32
- ┌─────────────────────────────┐
33
- Stage 1b: Source Scanner Static regex analysis
34
- source-scanner.mjs (optional — requires --project-dir)
35
- └──────────────┬──────────────┘
36
- merges into a11y-findings.json
37
-
38
- ┌─────────────────────────────┐
39
- Stage 2: Analyzer Fix intelligence enrichment
40
- analyzer.mjs intelligence.json + guardrails
41
- └──────────────┬──────────────┘
42
- │ a11y-findings.json
43
-
44
- ┌─────────────────────────────┐
45
- │ Stage 3: Report Builders │ Parallel rendering
46
- md / html / pdf / checklist│
47
- └──────────────┬──────────────┘
48
-
49
- ┌──────────┼──────────┬──────────────┐
50
- ▼ ▼ ▼
51
- remediation report report checklist
52
- .md .html .pdf .html
31
+ ┌─────────────────────────────────┐
32
+ │ Stage 1: DOM Scanner Three engines per route:
33
+ │ dom-scanner.mjs
34
+ │ │
35
+ ┌──────────┐ ┌──────┐ │
36
+ │ │ axe-core │ │ CDP │ │ Playwright Chromium
37
+ │ └────┬─────┘ └──┬───┘ │
38
+ │ │
39
+ ┌────▼───────────▼────┐
40
+ │ │ pa11y │ │ Puppeteer Chrome
41
+ └────────┬────────────┘ │
42
+ │ │ │
43
+ │ ┌────────▼────────────┐ │
44
+ │ │ Merge & Dedup │ │
45
+ └────────┬────────────┘
46
+ └───────────┼─────────────────────┘
47
+ │ a11y-scan-results.json
48
+ │ progress.json
49
+
50
+ ┌─────────────────────────────────┐
51
+ Stage 1b: Source Scanner │ Static regex analysis
52
+ │ source-scanner.mjs │ (optional — requires --project-dir)
53
+ └───────────┬─────────────────────┘
54
+ │ merges into a11y-findings.json
55
+
56
+ ┌─────────────────────────────────┐
57
+ │ Stage 2: Analyzer │ Fix intelligence enrichment
58
+ │ analyzer.mjs │ intelligence.json + guardrails
59
+ └───────────┬─────────────────────┘
60
+ │ a11y-findings.json
61
+
62
+ ┌─────────────────────────────────┐
63
+ │ Stage 3: Report Builders │ Parallel rendering
64
+ │ md / html / pdf / checklist │
65
+ └───────────┬─────────────────────┘
66
+
67
+ ┌───────┼──────────┬──────────────┐
68
+ ▼ ▼ ▼ ▼
69
+ remediation report report checklist
70
+ .md .html .pdf .html
53
71
  ```
54
72
 
55
73
  ## Stage 1: DOM scanner
56
74
 
57
75
  **Script**: `scripts/engine/dom-scanner.mjs`
58
76
 
59
- Launches a Playwright-controlled Chromium browser and runs axe-core against each discovered route.
77
+ Launches a Playwright-controlled Chromium browser, discovers routes, and runs three independent accessibility engines against each page. Results are merged and deduplicated before output.
78
+
79
+ ### Route discovery
60
80
 
61
- **Route discovery**:
62
81
  - If the site exposes a `sitemap.xml`, all listed URLs are scanned (up to `--max-routes`).
63
82
  - Otherwise, BFS crawl starting from `--base-url`, following same-origin `<a href>` links up to `--crawl-depth` levels deep.
64
83
  - Routes are deduplicated and normalized before scanning.
84
+ - 3 parallel browser tabs scan routes concurrently (~2-3x faster than sequential).
85
+
86
+ ### axe-core
87
+
88
+ **Dependency**: `@axe-core/playwright`
89
+
90
+ The primary engine. Injects axe-core into the live page via Playwright and runs WCAG 2.2 A/AA tag checks. Covers the majority of automatable WCAG success criteria (~80+ rules).
91
+
92
+ - Configurable via `--axe-tags` (default: `wcag2a,wcag2aa,wcag21a,wcag21aa,wcag22a,wcag22aa`)
93
+ - Supports `--only-rule` for focused single-rule audits
94
+ - Supports `--exclude-selectors` to skip specific elements
95
+
96
+ ### CDP checks
97
+
98
+ **Dependency**: Playwright's built-in CDP session (`page.context().newCDPSession()`)
99
+
100
+ Queries the browser's full accessibility tree via Chrome DevTools Protocol. Catches issues axe may miss because it operates on the computed accessibility tree rather than the DOM:
101
+
102
+ - **Missing accessible names** — interactive elements (`button`, `link`, `textbox`, `combobox`, etc.) with empty names in the accessibility tree
103
+ - **aria-hidden on focusable elements** — elements that are focusable but hidden from assistive technology
104
+
105
+ CDP findings use axe-compatible violation format with `source: "cdp"` for downstream processing.
106
+
107
+ ### pa11y
108
+
109
+ **Dependency**: `pa11y` (which uses Puppeteer + Chrome internally)
110
+
111
+ Runs Squiz's HTML CodeSniffer against each page URL. Catches WCAG violations that axe and CDP may miss:
112
+
113
+ - Heading hierarchy issues
114
+ - Link purpose violations
115
+ - Form label associations
116
+ - Additional WCAG2AA/WCAG2AAA checks from HTML CodeSniffer's rule set
117
+
118
+ pa11y requires a separate Chrome installation (`npx puppeteer browsers install chrome`). This is separate from Playwright's Chromium. If Chrome is missing, pa11y fails silently (non-fatal) and the scan continues with axe + CDP only.
119
+
120
+ pa11y findings use axe-compatible violation format with `source: "pa11y"` for downstream processing.
121
+
122
+ ### Merge and deduplication
123
+
124
+ After all three engines complete, `mergeViolations()` combines findings and removes cross-engine duplicates:
125
+
126
+ 1. **axe findings** are added first as the baseline
127
+ 2. **CDP findings** are checked against axe equivalents (e.g. `cdp-missing-accessible-name` maps to `button-name`, `link-name`, `input-name`, `aria-command-name`). Only truly new findings are added.
128
+ 3. **pa11y findings** are checked against existing selectors. If the same element is already flagged by axe or CDP, the pa11y finding is dropped.
129
+
130
+ The merged violations are written to `a11y-scan-results.json` per route.
131
+
132
+ ### Progress tracking
133
+
134
+ The scanner writes `progress.json` in real-time as each engine runs. This file is used by integrations (like `a11y-scanner`) for live progress UI:
135
+
136
+ ```json
137
+ {
138
+ "steps": {
139
+ "page": { "status": "done", "updatedAt": "..." },
140
+ "axe": { "status": "done", "updatedAt": "...", "found": 8 },
141
+ "cdp": { "status": "done", "updatedAt": "...", "found": 3 },
142
+ "pa11y": { "status": "done", "updatedAt": "...", "found": 2 },
143
+ "merge": { "status": "done", "updatedAt": "...", "axe": 8, "cdp": 3, "pa11y": 2, "merged": 11 }
144
+ },
145
+ "currentStep": "merge"
146
+ }
147
+ ```
65
148
 
66
- **Scanning**:
67
- - 3 parallel browser tabs scan routes concurrently (~2–3× faster than sequential).
68
- - axe-core 4.11+ runs WCAG 2.2 A, AA, and best-practice tag sets.
69
- - Screenshots of affected elements are captured for each violation.
70
- - `--color-scheme`, `--viewport`, `--wait-until`, and `--wait-ms` control the browser environment.
149
+ ### Screenshots
71
150
 
72
- **Output**: `a11y-scan-results.json` raw axe results per route with DOM snapshots.
151
+ After merging, element screenshots are captured for each violation. Non-visible elements (`<meta>`, `<link>`, `<script>`, etc.) are automatically skipped. Screenshots are stored in `.audit/screenshots/` and referenced by each violation's `screenshot_path` field.
73
152
 
74
153
  ### Optional: Source scanner
75
154
 
76
155
  **Script**: `scripts/engine/source-scanner.mjs` — runs when `--project-dir` is set and `--skip-patterns` is not.
77
156
 
78
- Performs static analysis of source files for accessibility issues axe cannot detect at runtime (e.g. focus outline suppression, missing alt text in templates). Uses regex patterns from `assets/remediation/code-patterns.json` scoped to framework-specific file boundaries from `assets/remediation/source-boundaries.json`.
157
+ Performs static analysis of source files for accessibility issues no runtime engine can detect (e.g. focus outline suppression, missing alt text in templates). Uses regex patterns from `assets/remediation/code-patterns.json` scoped to framework-specific file boundaries from `assets/remediation/source-boundaries.json`.
79
158
 
80
159
  Findings are classified as `confirmed` (pattern unambiguously matches) or `potential` (requires human verification).
81
160
 
@@ -83,13 +162,13 @@ Findings are classified as `confirmed` (pattern unambiguously matches) or `poten
83
162
 
84
163
  **Script**: `scripts/engine/analyzer.mjs`
85
164
 
86
- Reads `a11y-scan-results.json` and enriches each violation with:
165
+ Reads `a11y-scan-results.json` (which contains merged axe + CDP + pa11y results) and enriches each violation with:
87
166
 
88
- - **Fix intelligence** from `assets/remediation/intelligence.json` — 106 axe-core rules with code snippets, MDN links, framework-specific notes, and WCAG criterion mapping.
167
+ - **Fix intelligence** from `assets/remediation/intelligence.json` — 106 axe-core rules with code snippets, MDN links, framework-specific notes, and WCAG criterion mapping. CDP and pa11y findings receive generic enrichment based on their rule structure.
89
168
  - **Selector scoring** — picks the most stable selector from axe's `nodes` list. Priority: `#id` > `[data-*]` > `[aria-*]` > `[type=]`, with penalty for Tailwind utility classes.
90
169
  - **Framework context** — `assets/discovery/stack-detection.json` fingerprints the DOM to detect framework and CMS. Per-finding `framework_notes` and `cms_notes` are filtered to the detected stack.
91
170
  - **Guardrails** — `assets/remediation/guardrails.json` defines scope rules that prevent agents from touching backend code, third-party scripts, or minified files.
92
- - **Compliance scoring** — `assets/reporting/compliance-config.json` weights findings by severity to produce a 0100 score with grade thresholds.
171
+ - **Compliance scoring** — `assets/reporting/compliance-config.json` weights findings by severity to produce a 0-100 score with grade thresholds.
93
172
  - **Persona impact groups** — `assets/reporting/wcag-reference.json` maps findings to disability personas (visual, motor, cognitive, etc.).
94
173
 
95
174
  **Output**: `a11y-findings.json` — enriched findings array with all intelligence fields.
@@ -116,7 +195,7 @@ Assets are static JSON files bundled with the package under `assets/`. They are
116
195
  | Asset | Purpose |
117
196
  | :--- | :--- |
118
197
  | `reporting/compliance-config.json` | Score weights, grade thresholds, legal regulation list |
119
- | `reporting/wcag-reference.json` | WCAG criterion map, persona config, personarule mapping |
198
+ | `reporting/wcag-reference.json` | WCAG criterion map, persona config, persona-rule mapping |
120
199
  | `reporting/manual-checks.json` | 41 manual checks for the WCAG checklist |
121
200
  | `discovery/crawler-config.json` | BFS crawl defaults (timeouts, concurrency) |
122
201
  | `discovery/stack-detection.json` | Framework/CMS DOM fingerprints |
@@ -7,6 +7,7 @@
7
7
  ## Table of Contents
8
8
 
9
9
  - [Basic usage](#basic-usage)
10
+ - [Prerequisites](#prerequisites)
10
11
  - [Flag groups](#flag-groups)
11
12
  - [Targeting & scope](#targeting--scope)
12
13
  - [Audit intelligence](#audit-intelligence)
@@ -33,6 +34,22 @@ The only required flag is `--base-url`. All other flags are optional.
33
34
 
34
35
  ---
35
36
 
37
+ ## Prerequisites
38
+
39
+ The engine uses two separate browser installations:
40
+
41
+ ```bash
42
+ # Required — used by axe-core and CDP checks
43
+ npx playwright install chromium
44
+
45
+ # Required for pa11y — uses Puppeteer's Chrome (separate from Playwright)
46
+ npx puppeteer browsers install chrome
47
+ ```
48
+
49
+ If Puppeteer Chrome is missing, pa11y checks fail silently (non-fatal) and the scan continues with axe-core + CDP only.
50
+
51
+ ---
52
+
36
53
  ## Flag groups
37
54
 
38
55
  ### Targeting & scope
@@ -43,7 +60,7 @@ Controls what gets scanned.
43
60
  | :--- | :--- | :--- | :--- |
44
61
  | `--base-url` | `<url>` | (Required) | Starting URL. Must include protocol (`https://` or `http://`). |
45
62
  | `--max-routes` | `<num>` | `10` | Maximum unique same-origin paths to discover and scan. |
46
- | `--crawl-depth` | `<num>` | `2` | How deep to follow links during BFS discovery (13). Has no effect when `--routes` is set. |
63
+ | `--crawl-depth` | `<num>` | `2` | How deep to follow links during BFS discovery (1-3). Has no effect when `--routes` is set. |
47
64
  | `--routes` | `<csv>` | — | Explicit paths to scan (e.g. `/,/about,/contact`). Overrides auto-discovery entirely. |
48
65
  | `--project-dir` | `<path>` | — | Path to the audited project source. Enables the source code pattern scanner and framework auto-detection from `package.json`. |
49
66
 
@@ -64,6 +81,7 @@ Controls how findings are interpreted and filtered.
64
81
  | `--only-rule` | `<id>` | — | Run a single axe rule ID only. Useful for focused re-audits after fixing a specific issue. |
65
82
  | `--ignore-findings` | `<csv>` | — | Comma-separated list of axe rule IDs to suppress from output entirely. |
66
83
  | `--exclude-selectors` | `<csv>` | — | CSS selectors to skip. Elements matching these selectors are excluded from axe scanning. |
84
+ | `--axe-tags` | `<csv>` | `wcag2a,wcag2aa,wcag21a,wcag21aa,wcag22a,wcag22aa` | axe-core WCAG tag filter. Also determines the pa11y standard (`WCAG2A`, `WCAG2AA`, or `WCAG2AAA`). |
67
85
  | `--framework` | `<name>` | — | Override auto-detected framework. Affects which fix notes and source boundaries are applied. |
68
86
 
69
87
  **Supported `--framework` values**: `nextjs`, `gatsby`, `react`, `nuxt`, `vue`, `angular`, `astro`, `svelte`, `shopify`, `wordpress`, `drupal`.
@@ -79,7 +97,7 @@ Controls browser behavior during scanning.
79
97
  | `--color-scheme` | `light\|dark` | `light` | Emulates `prefers-color-scheme` media query. |
80
98
  | `--wait-until` | `domcontentloaded\|load\|networkidle` | `domcontentloaded` | Playwright page load strategy. Use `networkidle` for SPAs with async rendering. |
81
99
  | `--viewport` | `<WxH>` | `1280x800` | Browser viewport in pixels (e.g. `375x812` for mobile, `1440x900` for desktop). |
82
- | `--wait-ms` | `<num>` | `2000` | Fixed delay (ms) after page load before axe runs. Useful when JS renders content after `DOMContentLoaded`. |
100
+ | `--wait-ms` | `<num>` | `2000` | Fixed delay (ms) after page load before the engines run. Useful when JS renders content after `DOMContentLoaded`. |
83
101
  | `--timeout-ms` | `<num>` | `30000` | Network timeout per page load (ms). |
84
102
  | `--headed` | — | `false` | Launch browser in visible mode. Useful for debugging page rendering issues. |
85
103
  | `--affected-only` | — | `false` | Re-scan only routes that had violations in the previous scan. Reads `.audit/a11y-scan-results.json` to determine affected routes. Falls back to full scan if no prior results exist. |
@@ -190,6 +208,16 @@ a11y-audit \
190
208
  --project-dir .
191
209
  ```
192
210
 
211
+ ### Custom axe-core WCAG tags
212
+
213
+ ```bash
214
+ # Only WCAG 2.0 A checks
215
+ a11y-audit --base-url https://example.com --axe-tags wcag2a
216
+
217
+ # Include AAA checks
218
+ a11y-audit --base-url https://example.com --axe-tags wcag2a,wcag2aa,wcag2aaa
219
+ ```
220
+
193
221
  ---
194
222
 
195
223
  ## Exit codes