@diegovelasquezweb/a11y-engine 0.8.1 → 0.8.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,76 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.8.2] — 2026-03-16
9
+
10
+ ### Changed
11
+
12
+ - **Smarter AI source file selection** — `fetchSourceFilesForFindings` now scores candidate files by how many terms extracted from the finding's selector, class names, IDs, and title match the file path. Files most relevant to the specific failing element are fetched first instead of picking the first 3 files by extension.
13
+ - Extracted `extractSearchTermsFromFinding()` and `scoreFilePath()` helpers for reusable relevance scoring logic.
14
+
15
+ ---
16
+
17
+ ## [0.8.1] — 2026-03-16
18
+
19
+ ### Added
20
+
21
+ - **Custom AI system prompt** — `enrichWithAI()` now accepts `options.systemPrompt` to override the default Claude system prompt at runtime.
22
+ - `enrich.mjs` reads `AI_SYSTEM_PROMPT` env var and passes it to `enrichWithAI()` — enabling per-scan prompt customization without code changes.
23
+ - `audit.mjs` forwards `AI_SYSTEM_PROMPT` env var to the `enrich.mjs` child process.
24
+
25
+ ---
26
+
27
+ ## [0.8.0] — 2026-03-16
28
+
29
+ ### Changed
30
+
31
+ - **AI enrichment no longer overwrites original fix** — `enrich.mjs` now preserves the original `fix_description`/`fix_code` from the engine and stores Claude's output in separate fields: `ai_fix_description`, `ai_fix_code`, `ai_fix_code_lang`. Findings improved by AI are flagged with `aiEnhanced: true`.
32
+ - **AI system prompt rewritten** — Claude is now explicitly instructed to go beyond the generic fix: explain why the issue matters for real users, what specifically to look for in the codebase, and provide a production-quality code example different from the existing one.
33
+ - Default AI model updated to `claude-haiku-4-5-20251001`.
34
+
35
+ ---
36
+
37
+ ## [0.7.9] — 2026-03-16
38
+
39
+ ### Added
40
+
41
+ - **AI enrichment CLI step** — `audit.mjs` now runs `src/ai/enrich.mjs` after the analyzer step when `ANTHROPIC_API_KEY` env var is present. Non-fatal: if AI fails, the pipeline continues with unenriched findings.
42
+ - `src/ai/enrich.mjs` — new CLI script that reads `a11y-findings.json`, calls `enrichWithAI()`, and writes enriched findings back. Reads `A11Y_REPO_URL` and `GH_TOKEN` env vars for repo-aware enrichment.
43
+ - `src/ai/claude.mjs` — Claude AI enrichment module. Enriches Critical and Serious findings with context-aware fix descriptions and code snippets. Uses `claude-haiku-4-5-20251001` by default. Fetches source files from the GitHub repo when `repoUrl` is available.
44
+
45
+ ---
46
+
47
+ ## [0.7.8] — 2026-03-16
48
+
49
+ ### Fixed
50
+
51
+ - **pa11y ruleId normalization** — pa11y violation IDs (e.g. `WCAG2AAA.Principle1.Guideline1_4.1_4_6.G17`) are now normalized to a short, readable form (e.g. `pa11y-g17`) by taking only the last segment of the dotted code. Previously the full dotted path was used, producing unreadable badges like `Pa11y Wcag2aaa Principle1 Guideline1 4 1 4 6 G17`.
52
+
53
+ ---
54
+
55
+ ## [0.7.7] — 2026-03-15
56
+
57
+ ### Added
58
+
59
+ - **`--repo-url` and `--github-token` CLI flags** — `audit.mjs` now accepts `--repo-url <github-url>` and `--github-token <token>`. When a repo URL is provided, the engine fetches `package.json` via the GitHub API to detect the project framework before running the analyzer, and passes the detected framework to both the analyzer and the source pattern scanner. No `git clone` required.
60
+ - `source-scanner.mjs` CLI now accepts `--repo-url` and `--github-token`. When `--repo-url` is provided (without `--project-dir`), it runs `scanPatternRemote()` against the GitHub API instead of the local filesystem.
61
+ - `detectProjectContext()` is now called in `audit.mjs` when a remote repo is provided, enabling framework-aware fix suggestions without a local clone.
62
+
63
+ ### Changed
64
+
65
+ - `source-scanner.mjs`: `--project-dir` is no longer required when `--repo-url` is provided. `main()` is now async to support remote API calls.
66
+ - `audit.mjs`: pattern scanning is now triggered when either `--project-dir` or `--repo-url` is provided.
67
+
68
+ ---
69
+
70
+ ## [0.7.6] — 2026-03-15
71
+
72
+ ### Changed
73
+
74
+ - HTML report renderer: updated Tailwind class syntax (`flex-shrink-0` → `shrink-0`, `bg-gradient-to-br` → `bg-linear-to-br`, `max-h-[360px]` → `max-h-90`).
75
+
76
+ ---
77
+
8
78
  ## [0.4.2] — 2026-03-15
9
79
 
10
80
  ### Fixed
package/README.md CHANGED
@@ -50,7 +50,7 @@ import {
50
50
 
51
51
  #### runAudit
52
52
 
53
- Runs the full scan pipeline: route discovery, scan, merge, analyze, and optional AI enrichment. Returns a payload ready for `getFindings`.
53
+ Runs the full scan pipeline: route discovery, scan, merge, analyze, AI enrichment (when configured), and optional source pattern scanning. Returns a payload ready for `getFindings`.
54
54
 
55
55
  ```ts
56
56
  const payload = await runAudit({
@@ -58,6 +58,14 @@ const payload = await runAudit({
58
58
  maxRoutes: 5,
59
59
  axeTags: ["wcag2a", "wcag2aa", "best-practice"],
60
60
  engines: { axe: true, cdp: true, pa11y: true },
61
+ repoUrl: "https://github.com/owner/repo", // optional — enables source pattern scan and stack detection from package.json
62
+ githubToken: process.env.GH_TOKEN, // optional — for private repos and higher GitHub API rate limits
63
+ ai: {
64
+ enabled: true,
65
+ apiKey: process.env.ANTHROPIC_API_KEY,
66
+ githubToken: process.env.GH_TOKEN,
67
+ systemPrompt: "Custom prompt...", // optional — overrides default Claude system prompt
68
+ },
61
69
  onProgress: (step, status, extra) => console.log(`${step}: ${status}`, extra),
62
70
  });
63
71
  ```
@@ -125,10 +133,39 @@ These functions expose scanner help content, persona explanations, conformance l
125
133
 
126
134
  See [API Reference](docs/api-reference.md) for exact options and return types.
127
135
 
128
- ## Optional CLI
136
+ ## CLI
129
137
 
130
- If you need terminal execution, the package also exposes `a11y-audit`.
131
- See the [CLI Handbook](docs/cli-handbook.md) for command flags and examples.
138
+ The package exposes an `a11y-audit` binary for terminal execution.
139
+
140
+ ```bash
141
+ # Basic scan
142
+ pnpm exec a11y-audit --base-url https://example.com
143
+
144
+ # With source code pattern scanning via GitHub API (no clone)
145
+ pnpm exec a11y-audit --base-url https://example.com \
146
+ --repo-url https://github.com/owner/repo \
147
+ --github-token ghp_...
148
+
149
+ # With AI enrichment (set ANTHROPIC_API_KEY env var)
150
+ ANTHROPIC_API_KEY=sk-ant-... pnpm exec a11y-audit --base-url https://example.com
151
+
152
+ # With custom AI system prompt
153
+ AI_SYSTEM_PROMPT="You are..." ANTHROPIC_API_KEY=sk-ant-... pnpm exec a11y-audit --base-url https://example.com
154
+ ```
155
+
156
+ See the [CLI Handbook](docs/cli-handbook.md) for all flags and examples.
157
+
158
+ ## AI enrichment
159
+
160
+ When `ANTHROPIC_API_KEY` is set, the engine runs a post-scan enrichment step that sends Critical and Serious findings to Claude. Claude generates:
161
+
162
+ - A specific fix description referencing the actual selector, colors, and violation data
163
+ - A production-quality code snippet in the correct framework syntax
164
+ - Context-aware suggestions when repo source files are available
165
+
166
+ AI output is stored in separate fields (`ai_fix_description`, `ai_fix_code`) — the original engine fixes are always preserved. Findings improved by AI are flagged with `aiEnhanced: true`.
167
+
168
+ The system prompt is fully customizable via `options.ai.systemPrompt` (programmatic API) or the `AI_SYSTEM_PROMPT` env var (CLI).
132
169
 
133
170
  ## Documentation
134
171
 
@@ -35,7 +35,7 @@ Runs route discovery, runtime scan, merge, analyzer enrichment, and optional AI
35
35
  | `skipPatterns` | `boolean` |
36
36
  | `screenshotsDir` | `string` |
37
37
  | `engines` | `{ axe?: boolean; cdp?: boolean; pa11y?: boolean }` |
38
- | `ai` | `{ enabled?: boolean; apiKey?: string; githubToken?: string; model?: string }` |
38
+ | `ai` | `{ enabled?: boolean; apiKey?: string; githubToken?: string; model?: string; systemPrompt?: string }` — `systemPrompt` overrides the default Claude prompt when set |
39
39
  | `onProgress` | `(step: string, status: string, extra?: Record<string, unknown>) => void` |
40
40
 
41
41
  Progress steps emitted via `onProgress`:
@@ -54,6 +54,8 @@ Progress steps emitted via `onProgress`:
54
54
 
55
55
  Returns: `Promise<ScanPayload>`
56
56
 
57
+ > **`ai_enriched_findings` fast path**: When AI enrichment runs, the engine appends `ai_enriched_findings` to the payload. `getFindings()` checks for this field first — if present, it returns the already-enriched findings directly without re-normalizing the raw `findings` array.
58
+
57
59
  ### `getFindings(input, options?)`
58
60
 
59
61
  Normalizes and enriches findings and returns sorted enriched findings.
@@ -33,12 +33,21 @@ flowchart TD
33
33
 
34
34
  M --> R[a11y-scan-results.json]
35
35
  R --> AN[Analyzer]
36
- AN --> F[a11y-findings.json]
37
36
 
38
- F --> MD[remediation.md]
39
- F --> HTML[report.html]
40
- F --> PDF[report.pdf]
41
- F --> CHK[checklist.html]
37
+ REPO[GitHub Repo] -->|fetchPackageJson| AN
38
+ REPO -->|scanPatternRemote| PAT[a11y-pattern-findings.json]
39
+
40
+ AN --> F[a11y-findings.json]
41
+ F --> AI{ANTHROPIC_API_KEY?}
42
+ AI -->|yes| CL[Claude AI enrichment]
43
+ AI -->|no| SKIP[skip]
44
+ CL --> F2[a11y-findings.json enriched]
45
+ SKIP --> F2
46
+
47
+ F2 --> MD[remediation.md]
48
+ F2 --> HTML[report.html]
49
+ F2 --> PDF[report.pdf]
50
+ F2 --> CHK[checklist.html]
42
51
  ```
43
52
 
44
53
  ## Execution Modes
@@ -75,7 +84,10 @@ flowchart LR
75
84
  | :--- | :--- |
76
85
  | `src/pipeline/dom-scanner.mjs` | Route discovery, engine execution (axe/CDP/pa11y), merge/dedup, progress updates, screenshots |
77
86
  | `src/enrichment/analyzer.mjs` | Rule enrichment, selector strategy, ownership hints, recommendations, scoring metadata |
78
- | `src/source-patterns/source-scanner.mjs` | Static source pattern detection for issues runtime engines cannot see |
87
+ | `src/ai/enrich.mjs` | CLI subprocess that runs AI enrichment after the analyzer. Reads `ANTHROPIC_API_KEY` and `AI_SYSTEM_PROMPT` env vars. Non-fatal. |
88
+ | `src/ai/claude.mjs` | Anthropic API client. Sends Critical/Serious findings to Claude and parses improved fix suggestions. Supports custom system prompt and repo source file context. |
89
+ | `src/core/github-api.mjs` | GitHub API client. Provides `fetchPackageJson`, `fetchRepoFile`, `listRepoFiles`, and `parseRepoUrl`. Used for remote repo scanning and AI source file fetching without cloning. |
90
+ | `src/source-patterns/source-scanner.mjs` | Source code pattern scanner. Works against local `--project-dir` or remote `--repo-url` via the GitHub API. |
79
91
  | `src/reports/*.mjs` | Report builders for markdown/html/pdf/checklist |
80
92
  | `src/reports/renderers/*.mjs` | Shared rendering and normalization helpers |
81
93
  | `src/core/asset-loader.mjs` | Centralized access to bundled assets |
@@ -10,9 +10,12 @@
10
10
  - [Prerequisites](#prerequisites)
11
11
  - [Flag groups](#flag-groups)
12
12
  - [Targeting & scope](#targeting--scope)
13
+ - [Repository & remote scanning](#repository--remote-scanning)
14
+ - [AI enrichment](#ai-enrichment)
13
15
  - [Audit intelligence](#audit-intelligence)
14
16
  - [Execution & emulation](#execution--emulation)
15
17
  - [Output generation](#output-generation)
18
+ - [Environment variables](#environment-variables)
16
19
  - [Examples](#examples)
17
20
  - [Exit codes](#exit-codes)
18
21
 
@@ -62,7 +65,7 @@ Controls what gets scanned.
62
65
  | `--max-routes` | `<num>` | `10` | Maximum unique same-origin paths to discover and scan. |
63
66
  | `--crawl-depth` | `<num>` | `2` | How deep to follow links during BFS discovery (1-3). Has no effect when `--routes` is set. |
64
67
  | `--routes` | `<csv>` | — | Explicit paths to scan (e.g. `/,/about,/contact`). Overrides auto-discovery entirely. |
65
- | `--project-dir` | `<path>` | — | Path to the audited project source. Enables the source code pattern scanner and framework auto-detection from package.json. |
68
+ | `--project-dir` | `<path>` | — | Path to the audited project source on disk. Enables source code pattern scanning and framework auto-detection from the local `package.json`. |
66
69
 
67
70
  **Route discovery logic**:
68
71
  1. If the target has a `sitemap.xml`, all listed URLs are used (up to `--max-routes`).
@@ -71,6 +74,41 @@ Controls what gets scanned.
71
74
 
72
75
  ---
73
76
 
77
+ ### Repository & remote scanning
78
+
79
+ Enables source code analysis via the GitHub API — no `git clone` required.
80
+
81
+ | Flag | Argument | Default | Description |
82
+ | :--- | :--- | :--- | :--- |
83
+ | `--repo-url` | `<url>` | — | GitHub repository URL (e.g. `https://github.com/owner/repo`). Fetches `package.json` for framework detection and runs source code pattern scanning against the repo via the GitHub API. Mutually exclusive with `--project-dir` for remote usage. |
84
+ | `--github-token` | `<token>` | — | GitHub personal access token. Increases the GitHub API rate limit from 60 to 5,000 req/hour. Required for private repositories. Falls back to `GH_TOKEN` env var if not provided. |
85
+
86
+ When `--repo-url` is provided:
87
+ 1. The engine fetches `package.json` via `raw.githubusercontent.com` to detect the project framework.
88
+ 2. Source code patterns are run against the repo file tree using the GitHub Trees API and Contents API, with no local filesystem access.
89
+ 3. The detected framework is passed to the analyzer for framework-specific fix notes.
90
+
91
+ ---
92
+
93
+ ### AI enrichment
94
+
95
+ Controls Claude-powered fix suggestion enrichment. Requires `ANTHROPIC_API_KEY` to be set.
96
+
97
+ | Flag | Argument | Default | Description |
98
+ | :--- | :--- | :--- | :--- |
99
+ | *(no flag)* | — | — | AI enrichment is activated automatically when `ANTHROPIC_API_KEY` env var is present. There is no `--ai-enabled` flag — set or unset the env var to control it. |
100
+
101
+ AI enrichment runs after the analyzer step and enriches Critical and Serious findings (up to 20 per scan) with:
102
+ - A specific fix description referencing the actual selector, colors, and violation data
103
+ - A production-quality code snippet in the correct framework syntax
104
+ - Context-aware suggestions when repo source files are available via `--repo-url`
105
+
106
+ Original engine fixes are always preserved. AI output is stored in separate fields (`ai_fix_description`, `ai_fix_code`). Enriched findings are flagged with `aiEnhanced: true`.
107
+
108
+ The system prompt is customizable via `AI_SYSTEM_PROMPT` env var.
109
+
110
+ ---
111
+
74
112
  ### Audit intelligence
75
113
 
76
114
  Controls how findings are interpreted and filtered.
@@ -117,6 +155,16 @@ Controls what artifacts are written.
117
155
 
118
156
  ---
119
157
 
158
+ ## Environment variables
159
+
160
+ | Variable | Description |
161
+ | :--- | :--- |
162
+ | `ANTHROPIC_API_KEY` | Enables Claude AI enrichment. Set to a valid Anthropic API key. When absent, AI enrichment is silently skipped. |
163
+ | `AI_SYSTEM_PROMPT` | Custom system prompt for Claude. Overrides the default prompt for the entire scan. Useful for domain-specific fix guidance or custom output formats. |
164
+ | `GH_TOKEN` | GitHub personal access token. Used by the AI enrichment step when fetching source files from the repo. Equivalent to `--github-token` but read from the environment. |
165
+
166
+ ---
167
+
120
168
  ## Examples
121
169
 
122
170
  ### Minimal scan
@@ -135,7 +183,7 @@ a11y-audit \
135
183
  --output ./audit/report.html
136
184
  ```
137
185
 
138
- ### Include source code intelligence
186
+ ### Include source code intelligence (local)
139
187
 
140
188
  ```bash
141
189
  a11y-audit \
@@ -145,6 +193,32 @@ a11y-audit \
145
193
  --output ./audit/report.html
146
194
  ```
147
195
 
196
+ ### Scan with remote GitHub repository (no clone)
197
+
198
+ ```bash
199
+ a11y-audit \
200
+ --base-url https://example.com \
201
+ --repo-url https://github.com/owner/repo \
202
+ --github-token ghp_...
203
+ ```
204
+
205
+ ### Scan with AI enrichment
206
+
207
+ ```bash
208
+ ANTHROPIC_API_KEY=sk-ant-... a11y-audit \
209
+ --base-url https://example.com \
210
+ --repo-url https://github.com/owner/repo \
211
+ --github-token ghp_...
212
+ ```
213
+
214
+ ### Scan with custom AI system prompt
215
+
216
+ ```bash
217
+ AI_SYSTEM_PROMPT="You are an expert in Vue.js accessibility. Focus on component-level fixes." \
218
+ ANTHROPIC_API_KEY=sk-ant-... \
219
+ a11y-audit --base-url https://example.com --repo-url https://github.com/owner/repo
220
+ ```
221
+
148
222
  ### Focused re-audit — single rule, single route
149
223
 
150
224
  ```bash
@@ -16,9 +16,12 @@ This document is the current technical inventory of the engine package.
16
16
  | `src/core/utils.mjs` | Logging, JSON I/O, shared helpers |
17
17
  | `src/core/asset-loader.mjs` | Centralized asset map and loader |
18
18
  | `src/core/toolchain.mjs` | Environment/toolchain checks |
19
+ | `src/core/github-api.mjs` | GitHub API client — `fetchPackageJson`, `fetchRepoFile`, `listRepoFiles`, `parseRepoUrl`. Used for remote repo scanning and AI source file fetching. |
19
20
  | `src/pipeline/dom-scanner.mjs` | Runtime scan stage (axe/CDP/pa11y + merge) |
20
21
  | `src/enrichment/analyzer.mjs` | Finding enrichment and metadata synthesis |
21
- | `src/source-patterns/source-scanner.mjs` | Static source-pattern scanner |
22
+ | `src/ai/claude.mjs` | Claude AI client — calls the Anthropic API to enrich findings with context-aware fix suggestions. Accepts custom system prompt via `options.systemPrompt`. |
23
+ | `src/ai/enrich.mjs` | CLI AI enrichment subprocess — reads `a11y-findings.json`, calls `enrichWithAI()`, writes enriched findings back. Activated by `ANTHROPIC_API_KEY` env var. |
24
+ | `src/source-patterns/source-scanner.mjs` | Source code pattern scanner — works with local `--project-dir` or remote `--repo-url` via GitHub API |
22
25
  | `src/reports/html.mjs` | HTML report builder |
23
26
  | `src/reports/pdf.mjs` | PDF report builder |
24
27
  | `src/reports/md.mjs` | Markdown remediation builder |
@@ -152,8 +152,22 @@ A single finding can match multiple personas. The persona configuration (`person
152
152
  The compliance score is computed from severity totals using weights defined in `assets/reporting/compliance-config.mjs`:
153
153
 
154
154
  1. **Severity totals** — counts findings by `Critical`, `Serious`, `Moderate`, `Minor` (excluding AAA and Best Practice findings).
155
- 2. **Score** — starts at 100, deducts weighted points per finding.
156
- 3. **Label** maps score ranges to grades (`Excellent`, `Good Compliance`, `Needs Improvement`, `Poor`, `Critical`).
155
+ 2. **Score** — starts at 100, deducts weighted points per finding:
156
+ - Critical: −15 per finding
157
+ - Serious: −5 per finding
158
+ - Moderate: −2 per finding
159
+ - Minor: −0.5 per finding
160
+ - Score is clamped to 0–100 and rounded to nearest integer.
161
+ 3. **Label** — maps score ranges to grades:
162
+
163
+ | Score | Label |
164
+ | :--- | :--- |
165
+ | 90 – 100 | `Excellent` |
166
+ | 75 – 89 | `Good` |
167
+ | 55 – 74 | `Fair` |
168
+ | 35 – 54 | `Poor` |
169
+ | 0 – 34 | `Critical` |
170
+
157
171
  4. **WCAG status** — `Pass` (no findings), `Conditional Pass` (only Moderate/Minor), or `Fail` (any Critical/Serious).
158
172
 
159
173
  The `overallAssessment` in metadata follows the same logic for the formal compliance verdict.
@@ -187,14 +201,54 @@ The source scanner (`src/source-patterns/source-scanner.mjs`) detects accessibil
187
201
 
188
202
  5. Output includes a summary with `total`, `confirmed`, and `potential` counts.
189
203
 
204
+ ### Remote scanning via GitHub API
205
+
206
+ When `--repo-url` (CLI) or `options.repoUrl` (programmatic API) is provided instead of `--project-dir`, the source scanner uses the GitHub API — no `git clone` required:
207
+
208
+ 1. `listRepoFiles()` fetches the repo file tree using the GitHub Trees API. Falls back to the Contents API for truncated responses (large repos).
209
+ 2. Files matching each pattern's `globs` are fetched individually via `raw.githubusercontent.com`.
210
+ 3. The same regex and context rejection logic runs against the fetched content.
211
+ 4. Results are identical to local scanning.
212
+
213
+ A GitHub token (`--github-token` or `GH_TOKEN` env var) increases the API rate limit from 60 to 5,000 req/hour and enables private repo access.
214
+
190
215
  ### Integration with the audit pipeline
191
216
 
192
- When `runAudit` is called with `projectDir` and without `skipPatterns`:
217
+ When `runAudit` is called with `projectDir` or `repoUrl` and without `skipPatterns`:
218
+
219
+ 1. The engine fetches `package.json` from the repo (remote) or reads it from disk (local) to detect the framework before the analyzer runs.
220
+ 2. The analyzer runs with the detected framework context.
221
+ 3. Source patterns run after enrichment.
222
+ 4. Pattern findings are attached to the payload as `patternFindings` with their own `generated_at`, `project_dir`, `findings`, and `summary`.
223
+ 5. The remediation guide (`getRemediationGuide`) renders pattern findings in a dedicated section.
224
+
225
+ ### pa11y ruleId normalization
226
+
227
+ pa11y reports violations using dotted WCAG criterion codes (e.g. `WCAG2AA.Principle1.Guideline1_4.1_4_3.G18.Fail`). The engine normalizes these in two places:
228
+
229
+ 1. **Equivalence mapping** (`assets/scanning/pa11y-config.mjs`, `equivalenceMap`) — known pa11y codes are mapped to their axe-core equivalent rule ID (e.g. `Principle1.Guideline1_4.1_4_3.G145` → `color-contrast`). These findings are merged and deduplicated with axe findings.
230
+
231
+ 2. **Fallback normalization** (`src/pipeline/dom-scanner.mjs`) — pa11y codes without an axe equivalent are shortened to their last segment (e.g. `WCAG2AAA.Principle1.Guideline1_4.1_4_6.G17` → `pa11y-g17`). This produces a readable rule ID without the full dotted path.
232
+
233
+ ## AI Enrichment
234
+
235
+ After the analyzer step, the engine optionally runs Claude-powered enrichment on Critical and Serious findings (up to 20 per scan).
236
+
237
+ ### How it works
238
+
239
+ 1. `src/ai/enrich.mjs` reads `a11y-findings.json`, identifies Critical and Serious findings, and sends them to `enrichWithAI()`.
240
+ 2. `src/ai/claude.mjs` calls the Anthropic API with a system prompt instructing Claude to generate specific, production-quality fix suggestions using the actual violation data (selector, colors, ratio, etc.).
241
+ 3. When a repo URL is available (`A11Y_REPO_URL` env var), Claude also receives relevant source files fetched via the GitHub API. File selection is scored by how well each file path matches terms extracted from the finding's selector and title.
242
+ 4. Claude returns a JSON array of improvements. Each improvement contains a `fixDescription` and `fixCode` specific to the finding's context.
243
+ 5. The engine stores Claude's output in separate fields (`ai_fix_description`, `ai_fix_code`, `ai_fix_code_lang`) — the original engine fixes are preserved unchanged. Improved findings are flagged with `aiEnhanced: true`.
244
+
245
+ ### Activation
246
+
247
+ AI enrichment runs automatically when `ANTHROPIC_API_KEY` is present in the environment. It is non-fatal — if the API call fails, the pipeline continues with unenriched findings.
248
+
249
+ ### Custom system prompt
193
250
 
194
- 1. The analyzer runs first to detect the framework.
195
- 2. Source patterns run after enrichment.
196
- 3. Pattern findings are attached to the payload as `patternFindings` with their own `generated_at`, `project_dir`, `findings`, and `summary`.
197
- 4. The remediation guide (`getRemediationGuide`) renders pattern findings in a dedicated section.
251
+ The default system prompt instructs Claude to go beyond the generic fix: explain why the issue matters for users, reference the specific selector and violation data, and provide a more complete code example than the engine's default. The prompt can be overridden per-scan via the `AI_SYSTEM_PROMPT` env var or `options.ai.systemPrompt` in the programmatic API.
198
252
 
199
253
  ## Assets Reference
200
254
 
package/docs/outputs.md CHANGED
@@ -10,6 +10,7 @@
10
10
  - [progress.json](#progressjson)
11
11
  - [a11y-scan-results.json](#a11y-scan-resultsjson)
12
12
  - [a11y-findings.json](#a11y-findingsjson)
13
+ - [a11y-pattern-findings.json](#a11y-pattern-findingsjson)
13
14
  - [remediation.md](#remediationmd)
14
15
  - [report.html](#reporthtml)
15
16
  - [report.pdf](#reportpdf)
@@ -90,7 +91,7 @@ Merged results from all three engines (axe-core + CDP + pa11y) per route. Writte
90
91
  }
91
92
  ```
92
93
 
93
- Each violation in the `violations` array includes a `source` field indicating which engine produced it (`undefined` for axe-core, `"cdp"` for CDP checks, `"pa11y"` for pa11y).
94
+ Each violation in the `violations` array includes a `source` field: `"cdp"` for CDP checks, `"pa11y"` for pa11y, and absent (field not set) for axe-core violations.
94
95
 
95
96
  This file is consumed by `analyzer.mjs` and also used by `--affected-only` to determine which routes to re-scan on subsequent runs.
96
97
 
@@ -176,6 +177,25 @@ The primary enriched data artifact. Written by `src/enrichment/analyzer.mjs`. Th
176
177
  | `verification_command_fallback` | `string\|null` | Fallback verify command |
177
178
  | `pages_affected` | `number\|null` | Number of pages with this violation |
178
179
  | `affected_urls` | `string[]\|null` | All URLs where this violation appears |
180
+ | `aiEnhanced` | `boolean` | `true` when Claude improved the fix for this finding. Only present on AI-enriched findings. |
181
+ | `ai_fix_description` | `string\|null` | Claude-generated fix description. More specific than `fix_description` — references the actual selector, colors, and violation data. Only present when `aiEnhanced` is `true`. |
182
+ | `ai_fix_code` | `string\|null` | Claude-generated code snippet in the correct framework syntax. Separate from the engine's `fix_code`. Only present when `aiEnhanced` is `true`. |
183
+ | `ai_fix_code_lang` | `string\|null` | Language of `ai_fix_code` (e.g. `jsx`, `tsx`, `vue`, `css`). Only present when `aiEnhanced` is `true`. |
184
+
185
+ > **Note on `ownership_status`**: Values are `"primary"` (issue is in the project's source), `"outside_primary_source"` (issue is in a third-party component), or `"unknown"`. These are different from the pattern finding `status` field which uses `"confirmed"` and `"potential"`.
186
+
187
+ ### Top-level payload keys (after AI enrichment)
188
+
189
+ When AI enrichment runs, the engine appends `ai_enriched_findings` to the payload root. `getFindings()` uses this as a fast path — if present, it returns `ai_enriched_findings` directly without re-normalizing the raw `findings` array.
190
+
191
+ ```json
192
+ {
193
+ "metadata": { ... },
194
+ "findings": [ ... ],
195
+ "ai_enriched_findings": [ ... ],
196
+ "incomplete_findings": [ ... ]
197
+ }
198
+ ```
179
199
 
180
200
  ### `incomplete_findings`
181
201
 
@@ -183,6 +203,45 @@ Violations that axe-core flagged as "needs review" (not confirmed pass or fail).
183
203
 
184
204
  ---
185
205
 
206
+ ## a11y-pattern-findings.json
207
+
208
+ Source code pattern scan results. Written by `src/source-patterns/source-scanner.mjs` when `--project-dir` or `--repo-url` is provided (and `--skip-patterns` is not set).
209
+
210
+ ```json
211
+ {
212
+ "generated_at": "2026-03-16T00:00:00.000Z",
213
+ "project_dir": "https://github.com/owner/repo",
214
+ "findings": [ ... ],
215
+ "summary": {
216
+ "total": 5,
217
+ "confirmed": 3,
218
+ "potential": 2
219
+ }
220
+ }
221
+ ```
222
+
223
+ ### Per-finding fields
224
+
225
+ | Field | Type | Description |
226
+ | :--- | :--- | :--- |
227
+ | `id` | `string` | Deterministic finding ID |
228
+ | `pattern_id` | `string` | Pattern definition ID (e.g. `placeholder-only-label`) |
229
+ | `title` | `string` | Pattern title |
230
+ | `severity` | `string` | `Critical`, `Serious`, `Moderate`, or `Minor` |
231
+ | `wcag` | `string` | WCAG success criterion string |
232
+ | `wcag_criterion` | `string` | WCAG criterion ID |
233
+ | `wcag_level` | `string` | `A`, `AA`, or `AAA` |
234
+ | `type` | `string` | Pattern type (`structural`, `css`, etc.) |
235
+ | `fix_description` | `string\|null` | How to fix this pattern |
236
+ | `status` | `string` | `confirmed` (regex match without reject context) or `potential` (match with uncertainty) |
237
+ | `file` | `string` | File path within the repo (e.g. `src/components/Button.tsx`) |
238
+ | `line` | `number` | Line number of the match |
239
+ | `match` | `string` | The matched line content |
240
+ | `context` | `string` | 7-line code context window around the match |
241
+ | `source` | `string` | Always `"code-pattern"` |
242
+
243
+ ---
244
+
186
245
  ## remediation.md
187
246
 
188
247
  AI agent-optimized remediation guide. Always generated (even without `--with-reports`). Written to `.audit/remediation.md`.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@diegovelasquezweb/a11y-engine",
3
- "version": "0.8.1",
3
+ "version": "0.8.3",
4
4
  "description": "WCAG 2.2 accessibility audit engine — scanner, analyzer, and report builders",
5
5
  "type": "module",
6
6
  "license": "MIT",
package/src/ai/claude.mjs CHANGED
@@ -132,6 +132,54 @@ async function callClaude(apiKey, model, systemPrompt, userMessage) {
132
132
  * @param {string|undefined} githubToken
133
133
  * @returns {Promise<Record<string, string>>}
134
134
  */
135
+ /**
136
+ * Extracts candidate component/class names from a CSS selector or HTML snippet.
137
+ * e.g. ".trustarc-banner-right > span" → ["trustarc", "banner"]
138
+ * e.g. "#search-input" → ["search", "input"]
139
+ */
140
+ function extractSearchTermsFromFinding(finding) {
141
+ const terms = new Set();
142
+ const sources = [
143
+ finding.primarySelector || finding.selector || "",
144
+ finding.title || "",
145
+ ];
146
+
147
+ for (const src of sources) {
148
+ // Extract class names: .foo-bar → ["foo", "bar"]
149
+ const classes = src.match(/\.[\w-]+/g) || [];
150
+ for (const cls of classes) {
151
+ const parts = cls.slice(1).split(/[-_]/);
152
+ for (const p of parts) {
153
+ if (p.length > 3) terms.add(p.toLowerCase());
154
+ }
155
+ }
156
+ // Extract IDs: #foo-bar → ["foo", "bar"]
157
+ const ids = src.match(/#[\w-]+/g) || [];
158
+ for (const id of ids) {
159
+ const parts = id.slice(1).split(/[-_]/);
160
+ for (const p of parts) {
161
+ if (p.length > 3) terms.add(p.toLowerCase());
162
+ }
163
+ }
164
+ // Extract data attributes: [data-component="Foo"] → ["foo"]
165
+ const dataAttrs = src.match(/data-[\w-]+=["']?[\w-]+["']?/g) || [];
166
+ for (const attr of dataAttrs) {
167
+ const val = attr.split(/=["']?/)[1]?.replace(/["']/, "").toLowerCase();
168
+ if (val && val.length > 3) terms.add(val);
169
+ }
170
+ }
171
+
172
+ return [...terms].slice(0, 5);
173
+ }
174
+
175
+ /**
176
+ * Scores a file path by how many search terms it contains.
177
+ */
178
+ function scoreFilePath(filePath, terms) {
179
+ const lower = filePath.toLowerCase();
180
+ return terms.filter((t) => lower.includes(t)).length;
181
+ }
182
+
135
183
  async function fetchSourceFilesForFindings(findings, repoUrl, githubToken) {
136
184
  const sourceFiles = {};
137
185
  if (!repoUrl) return sourceFiles;
@@ -139,30 +187,50 @@ async function fetchSourceFilesForFindings(findings, repoUrl, githubToken) {
139
187
  const { fetchRepoFile, listRepoFiles, parseRepoUrl } = await import("../core/github-api.mjs");
140
188
  if (!parseRepoUrl(repoUrl)) return sourceFiles;
141
189
 
142
- const patterns = new Set(
143
- findings
144
- .filter((f) => f.fileSearchPattern)
145
- .map((f) => f.fileSearchPattern)
146
- );
190
+ // Collect all extensions needed
191
+ const extensions = new Set();
192
+ for (const f of findings) {
193
+ if (!f.fileSearchPattern) continue;
194
+ const extMatch = f.fileSearchPattern.match(/\*\.(\w+)$/);
195
+ if (extMatch) extensions.add(`.${extMatch[1]}`);
196
+ }
197
+ if (extensions.size === 0) return sourceFiles;
147
198
 
148
- for (const pattern of patterns) {
149
- try {
150
- // Extract extension from pattern (e.g. "src/components/*.tsx" -> ".tsx")
151
- const extMatch = pattern.match(/\*\.(\w+)$/);
152
- if (!extMatch) continue;
153
- const ext = `.${extMatch[1]}`;
154
-
155
- const files = await listRepoFiles(repoUrl, [ext], githubToken);
156
- // Pick up to 3 most relevant files per pattern
157
- const relevant = files.slice(0, 3);
158
- for (const filePath of relevant) {
159
- if (!sourceFiles[filePath]) {
160
- const content = await fetchRepoFile(repoUrl, filePath, githubToken);
161
- if (content) sourceFiles[filePath] = content;
162
- }
163
- }
164
- } catch {
165
- // non-fatal
199
+ // Fetch full file list once
200
+ let allFiles = [];
201
+ try {
202
+ allFiles = await listRepoFiles(repoUrl, [...extensions], githubToken);
203
+ } catch {
204
+ return sourceFiles;
205
+ }
206
+
207
+ // For each finding, find the most relevant files by selector/title terms
208
+ const MAX_FILES_PER_FINDING = 2;
209
+ const MAX_TOTAL_FILES = 6;
210
+
211
+ for (const finding of findings) {
212
+ if (Object.keys(sourceFiles).length >= MAX_TOTAL_FILES) break;
213
+
214
+ const terms = extractSearchTermsFromFinding(finding);
215
+
216
+ // Score and sort files by relevance to this finding
217
+ const scored = allFiles
218
+ .map((fp) => ({ fp, score: scoreFilePath(fp, terms) }))
219
+ .filter(({ score }) => score > 0)
220
+ .sort((a, b) => b.score - a.score);
221
+
222
+ // Fall back to first files if no relevant match found
223
+ const candidates = scored.length > 0
224
+ ? scored.slice(0, MAX_FILES_PER_FINDING).map(({ fp }) => fp)
225
+ : allFiles.slice(0, 1);
226
+
227
+ for (const filePath of candidates) {
228
+ if (sourceFiles[filePath]) continue;
229
+ if (Object.keys(sourceFiles).length >= MAX_TOTAL_FILES) break;
230
+ try {
231
+ const content = await fetchRepoFile(repoUrl, filePath, githubToken);
232
+ if (content) sourceFiles[filePath] = content;
233
+ } catch { /* non-fatal */ }
166
234
  }
167
235
  }
168
236
 
@@ -856,8 +856,10 @@ function buildFindings(inputPayload, cliArgs) {
856
856
  selector: selectors.join(", "),
857
857
  impacted_users: getImpactedUsers(v.id, v.tags),
858
858
  primary_selector: bestSelector,
859
- actual:
860
- firstNode?.failureSummary || `Found ${nodes.length} instance(s).`,
859
+ actual: (() => {
860
+ const raw = firstNode?.failureSummary || `Found ${nodes.length} instance(s).`;
861
+ return raw.replace(/^Fix any of the following:\s*/i, "").trim();
862
+ })(),
861
863
  primary_failure_mode: failureInsights.primaryFailureMode,
862
864
  relationship_hint: failureInsights.relationshipHint,
863
865
  failure_checks: failureInsights.failureChecks,