npm - argusqa-os - Versions diffs - 9.7.6 → 9.8.0 - Mend

argusqa-os 9.7.6 → 9.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md +9 -9
package/glama.json +2 -2
package/package.json +6 -6
package/src/cli/pr-validate.js +275 -56
package/src/mcp-server.js +142 -26
package/src/orchestration/crawl-and-report.js +1 -1
package/src/orchestration/orchestrator.js +34 -0
package/src/utils/audit-depth.js +148 -0
package/src/utils/deploy-preview.js +210 -0
package/src/utils/github-api.js +242 -0
package/src/utils/github-reporter.js +251 -39
package/src/utils/html-reporter.js +283 -92
package/src/utils/import-graph.js +290 -0
package/src/utils/parallel-crawler.js +202 -0
package/src/utils/pr-baseline.js +230 -0
package/src/utils/pr-diff-analyzer.js +378 -40
package/src/utils/route-discoverer.js +25 -3

package/README.md CHANGED Viewed

@@ -4,7 +4,7 @@
 [![npm](https://img.shields.io/npm/v/argusqa-os?color=7C3AED)](https://www.npmjs.com/package/argusqa-os)
 [![MCP Server](https://glama.ai/mcp/servers/ironclawdevs27/Argus/badges/card.svg)](https://glama.ai/mcp/servers/ironclawdevs27/Argus)
-[![Harness](https://img.shields.io/badge/harness-846%2F846-4ADE80)](test-harness/)
+[![Harness](https://img.shields.io/badge/harness-961%2F961-4ADE80)](test-harness/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
 **Argus catches the bugs your test suite misses — visual regressions, API loops, CSS drift, console noise, accessibility failures, and more — and delivers rich reports to Slack (or a local HTML dashboard).**
@@ -109,7 +109,7 @@ Ask Claude (or any MCP client) — no terminal required:
 | `argus_last_report` | Return last JSON report without re-running |
 | `argus_design_audit` | Figma URL → 13 design-token finding types (color, spacing, typography, shadows, etc.) |
 | `argus_visual_diff` | Screenshot baseline comparison. Pass `updateBaseline: true` to reset. |
-| `argus_pr_validate` | Fetch GitHub PR diff → map changed files to affected routes → targeted audit → `{ blocked, findings }` |
+| `argus_pr_validate` | Fetch GitHub PR diff → map changed files to affected routes → targeted audit → baseline-aware block decision (blocks on findings the PR *introduces*) + idempotent PR comment + Check Run → `{ blocked, findings, baseline, reporting }` |
 **Example prompts:**
@@ -217,8 +217,8 @@ npm run report:html    # Generate reports/report.html from last JSON audit
 npm run report:pdf     # Export HTML report to A4 PDF (requires: npm install puppeteer)
 npm run server         # Start Slack slash-command server (port 3001)
 npm run init           # Interactive setup wizard
-npm run test:unit          # 94 unit tests — no Chrome required
-npm run test:harness       # 149-block correctness harness — requires Chrome
+npm run test:unit          # 366 unit tests — no Chrome required
+npm run test:harness       # 166-block correctness harness — requires Chrome
 npm run test:harness:log   # same, but tees full output to harness-results.txt
 npm run test:coverage      # merged unit + harness coverage gate (requires Chrome)
 ```
@@ -283,7 +283,7 @@ The included [workflow](.github/workflows/argus.yml) runs on push to `main`, dai
 | `ARGUS_RETRY_ATTEMPTS` | `3` | Max retries for `navigate`/`fill` MCP calls |
 | `ARGUS_WATCH_INTERVAL_MS` | `1000` | Watch mode poll interval (ms) |
 | `ARGUS_WATCH_UI_PORT` | `3002` | Watch mode web dashboard port |
-| `ARGUS_SOURCE_DIR` | — | App source path — enables env-var / feature-flag / dead-route analysis |
+| `ARGUS_SOURCE_DIR` | — | App source path — enables env-var / feature-flag / dead-route analysis **and** framework-aware PR route mapping (import-graph: a changed component/stylesheet → only the routes that render it) |
 | `ARGUS_ENV_FILE` | — | Path to app `.env` for codebase cross-reference |
 | `SCREENSHOT_DIFF_THRESHOLD` | `0.5` | Pixel diff % threshold for environment comparison |
 | `GITHUB_TOKEN` | — | For PR comments + Check Runs |
@@ -343,7 +343,7 @@ Argus is a **complementary layer**, not a replacement for unit or E2E tests:
 ## Known Limitations
-All 846 harness assertions pass (`846/846`) — there are currently no known MCP- or Chrome-layer restrictions. Lighthouse now runs in headless (after the `lighthouse_audit` argument fix); the remaining soft assertions (perf traces, GC-dependent heap-growth) are promoted to counted hard assertions only in the weekly strict-soft lane (`harness-strict.yml`) via `ARGUS_HARNESS_STRICT_SOFT`.
+All 961 harness assertions pass (`961/961`) — there are currently no known MCP- or Chrome-layer restrictions. Lighthouse now runs in headless (after the `lighthouse_audit` argument fix); the remaining soft assertions (perf traces, GC-dependent heap-growth) are promoted to counted hard assertions only in the weekly strict-soft lane (`harness-strict.yml`) via `ARGUS_HARNESS_STRICT_SOFT`.
 ---
@@ -362,8 +362,8 @@ src/
     chrome-launcher.js  — npm run chrome / argus-chrome — launches Chrome with correct flags
     doctor.js           — npm run doctor / argus-doctor — pre-flight checks
     pr-validate.js      — headless CI entry point for GitHub Actions
-test-harness/           — 149-block correctness harness, 846 hard assertions, 63 fixture pages
-test/unit/              — 94 Vitest unit tests (no Chrome required)
+test-harness/           — 166-block correctness harness, 961 hard assertions, 63 fixture pages
+test/unit/              — 366 Vitest unit tests (no Chrome required)
 landing/                — Product landing page (React 19 + Vite + Tailwind)
 ```
@@ -374,7 +374,7 @@ Full source map → [CLAUDE.md](CLAUDE.md) · MCP/DSL reference → [SKILL.md](S
 ## Contributing
 1. Fork the repo and create a branch
-2. `npm run test:unit` — verify without Chrome (94 tests)
+2. `npm run test:unit` — verify without Chrome (366 tests)
 3. `npm run test:harness` — full integration coverage (requires Chrome on port 9222)
 4. Open a PR — Argus audits itself via the CI workflow

package/glama.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "$schema": "https://glama.ai/mcp/schemas/server.json",
   "name": "argus",
-  "description": "AI-powered QA harness that audits web apps via Chrome DevTools Protocol. Catches JS errors, network failures, a11y violations, SEO issues, security headers, CSS regressions, and more — directly from Claude conversations. 9 MCP tools: argus_audit (fast 8-analyzer pass), argus_audit_full (Lighthouse + memory + responsive), argus_compare (dev vs staging diff), argus_last_report (retrieve last JSON report), argus_watch_snapshot (live tab snapshot without navigating), argus_get_context (LLM-optimized context + fix loop with snapshot_id diff), argus_design_audit (Figma design fidelity — 13 finding types), argus_visual_diff (screenshot baseline comparison, updateBaseline flag), argus_pr_validate (PR diff → affected routes → targeted audit → blocked flag). Every finding is post-processed with intelligent baseline filtering (cross-run noise classifier) and root cause linking (recent git commits mapped to new findings). 149 test blocks, 846 hard assertions, 67 detection categories.",
+  "description": "AI-powered QA harness that audits web apps via Chrome DevTools Protocol. Catches JS errors, network failures, a11y violations, SEO issues, security headers, CSS regressions, and more — directly from Claude conversations. 9 MCP tools: argus_audit (fast 8-analyzer pass), argus_audit_full (Lighthouse + memory + responsive), argus_compare (dev vs staging diff), argus_last_report (retrieve last JSON report), argus_watch_snapshot (live tab snapshot without navigating), argus_get_context (LLM-optimized context + fix loop with snapshot_id diff), argus_design_audit (Figma design fidelity — 13 finding types), argus_visual_diff (screenshot baseline comparison, updateBaseline flag), argus_pr_validate (PR diff → affected routes → baseline-aware targeted audit + PR comment/Check Run → blocked flag). Every finding is post-processed with intelligent baseline filtering (cross-run noise classifier) and root cause linking (recent git commits mapped to new findings). 166 test blocks, 961 hard assertions, 67 detection categories.",
   "maintainers": ["ironclawdevs27"],
   "tools": [
     {
@@ -38,7 +38,7 @@
     },
     {
       "name": "argus_pr_validate",
-      "description": "Targeted QA audit driven by a GitHub pull request diff. Fetches the PR's changed files, maps them to affected routes in your target config using path-slug heuristics (infrastructure changes trigger a full audit), then audits only those routes. Returns { findings, affectedRoutes, changedFiles, perRoute, summary, blocked, blockOn }. Use in CI to gate merges — blocked:true when findings meet the blockOn threshold (none/warning/critical, default: critical). Requires Chrome on --remote-debugging-port=9222. GITHUB_TOKEN env var recommended for private repos."
+      "description": "Targeted QA audit driven by a GitHub pull request diff. Fetches the PR's changed files, maps them to affected routes in your target config using path-slug heuristics (infrastructure changes trigger a full audit) — or, when ARGUS_SOURCE_DIR points at the checked-out app source, framework-aware import-graph mapping that narrows a changed component or stylesheet to only the routes whose pages import it (Next.js convention + monorepo-aware, conservative-fallback on any ambiguity so a regression is never missed), then audits only those routes. The audit target is resolved per-PR: an explicit targetUrl, else the PR's deploy-preview URL (ARGUS_PREVIEW_URL or opt-in GitHub-Deployments auto-detection), else TARGET_DEV_URL. Routes are audited with bounded concurrency (ARGUS_CONCURRENCY) and each route audit is timeout-bounded so a hung audit blocks rather than silently passing. Returns { findings, affectedRoutes, changedFiles, perRoute, summary, blocked, blockOn, baseline, reporting }. Blocking is baseline-aware: it gates on the findings the PR introduces vs a stored per-branch baseline (reports/baselines/<base-branch>.json, restored via actions/cache), failing safe to absolute counts when no baseline is available. When GITHUB_TOKEN and a resolvable PR are present it also posts/updates an Argus PR comment (surfacing new/persisting/resolved counts) and a GitHub Check Run (the same reporting the CI Action produces) — best-effort, never alters the block decision. Use in CI to gate merges — blocked:true when findings meet the blockOn threshold (none/warning/critical, default: critical). Requires Chrome on --remote-debugging-port=9222. GITHUB_TOKEN env var recommended for private repos."
     }
   ]
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "argusqa-os",
-  "version": "9.7.6",
+  "version": "9.8.0",
   "mcpName": "io.github.ironclawdevs27/argus",
   "description": "Argus — AI-powered automated dev-testing platform using Chrome DevTools MCP and Claude Code",
   "keywords": [
@@ -55,7 +55,7 @@
     "test": "npm run test:unit && npm run test:harness",
     "coverage:unit": "vitest run test/unit --coverage",
     "coverage:harness": "c8 npm run test:harness",
-    "coverage:gate": "node scripts/coverage-gate.mjs --min-lines 60 --allow-uncovered src/mcp-server.js,src/orchestration/env-comparison.js,src/server/index.js",
+    "coverage:gate": "node scripts/coverage-gate.mjs --min-lines 70 --allow-uncovered src/mcp-server.js,src/orchestration/env-comparison.js,src/server/index.js",
     "test:coverage": "npm run coverage:harness && npm run coverage:unit && npm run coverage:gate",
     "report:html": "node src/utils/html-reporter.js",
     "report:pdf": "node src/utils/pdf-exporter.js",
@@ -64,9 +64,9 @@
   "dependencies": {
     "@modelcontextprotocol/sdk": "^1.29.0",
     "@opentelemetry/api": "^1.9.1",
-    "@opentelemetry/sdk-node": "^0.218.0",
-    "@slack/web-api": "^7.16.0",
-    "axe-core": "^4.12.0",
+    "@opentelemetry/sdk-node": "^0.219.0",
+    "@slack/web-api": "^7.17.0",
+    "axe-core": "^4.12.1",
     "dotenv": "^17.4.2",
     "express": "^5.2.1",
     "pino": "^10.3.1",
@@ -77,7 +77,7 @@
   },
   "devDependencies": {
     "@vitest/coverage-v8": "^4.1.8",
-    "c8": "^10.1.3",
+    "c8": "^11.0.0",
     "fast-check": "^4.8.0",
     "istanbul-lib-coverage": "^3.2.2",
     "vitest": "^4.1.8"

package/src/cli/pr-validate.js CHANGED Viewed

@@ -25,8 +25,16 @@ import fs   from 'fs';
 import path from 'path';
 import { fileURLToPath }                              from 'url';
 import { createMcpClient }                            from '../utils/mcp-client.js';
-import { crawlRouteCheap }                            from '../orchestration/crawl-and-report.js';
-import { fetchPrFiles, mapFilesToRoutes } from '../utils/pr-diff-analyzer.js';
+import { crawlRouteWithDepth }                         from '../orchestration/crawl-and-report.js';
+import { resolveAuditDepth, selectAnalyzers }          from '../utils/audit-depth.js';
+import { auditRoutesConcurrently, auditRouteWithRetry, routeResilienceFromEnv } from '../utils/parallel-crawler.js';
+import { fetchPrFiles, mapFilesToRoutesDeep, resolveAnnotationTarget } from '../utils/pr-diff-analyzer.js';
+import { resolveTargetUrl }                           from '../utils/deploy-preview.js';
+import { reportPrValidation }                         from '../utils/github-reporter.js';
+import { getCurrentBranch }                            from '../utils/baseline-manager.js';
+import {
+  decidePrBlock, resolvePrBaselineFile, loadPrBaseline, savePrBaseline, tagFindingNovelty, severityTally,
+} from '../utils/pr-baseline.js';
 // ── Exported helpers (testable without Chrome) ────────────────────────────────
@@ -41,10 +49,13 @@ import { fetchPrFiles, mapFilesToRoutes } from '../utils/pr-diff-analyzer.js';
  * @param {Array<object>} opts.findings
  * @param {string[]} opts.changedFiles
  * @param {string} opts.blockOn   critical | warning | none
+ * @param {object} [opts.baseline] baseline-aware diff status (Phase B1). When available:
+ *   { available: true, newCritical, newWarning, newInfo, persisting, resolved }.
+ *   When not: { available: false, note }. Omit entirely for non-baseline callers.
  * @param {string} [opts.error]   top-level error message (startup / fetch failure)
  * @returns {string}
  */
-export function buildStepSummary({ blocked, summary, affectedRoutes, perRoute, findings, changedFiles, blockOn, error }) {
+export function buildStepSummary({ blocked, summary, affectedRoutes, perRoute, findings, changedFiles, blockOn, baseline, error }) {
   const icon   = blocked ? '🔴' : summary.critical + summary.warning === 0 ? '✅' : '⚠️';
   const status = blocked ? 'BLOCKED — merge prevented' : 'PASSED';
@@ -62,6 +73,20 @@ export function buildStepSummary({ blocked, summary, affectedRoutes, perRoute, f
   md += `| Routes audited | ${affectedRoutes.length} |\n`;
   md += `| Files changed | ${changedFiles.length} |\n\n`;
+  // Baseline-aware blocking status (Phase B1). Either the head-vs-base diff counts, or the
+  // fail-safe note when no per-branch baseline was available to diff against.
+  if (baseline) {
+    if (baseline.available) {
+      md += `### Baseline diff\n\n`;
+      md += `Blocking on **new** findings this PR introduces (vs the base-branch baseline).\n\n`;
+      md += `| | 🔴 Critical | ⚠️ Warning | ℹ️ Info |\n|--|------------|-----------|--------|\n`;
+      md += `| New | ${baseline.newCritical} | ${baseline.newWarning} | ${baseline.newInfo} |\n\n`;
+      md += `_${baseline.persisting} persisting · ${baseline.resolved} resolved._\n\n`;
+    } else {
+      md += `> ⚠️ ${baseline.note ?? 'Baseline unavailable — blocking on absolute finding counts.'}\n\n`;
+    }
+  }
   if (perRoute.length > 0) {
     md += `### Route Breakdown\n\n`;
     md += `| Route | 🔴 Critical | ⚠️ Warning | ℹ️ Info |\n|-------|------------|-----------|--------|\n`;
@@ -152,6 +177,37 @@ export function normalizeRoutePaths(routes) {
   });
 }
+/**
+ * All-routes-failed guard (safety-critical — must never false-PASS). True when at least one
+ * route was audited and EVERY audited route errored (the app was unreachable, or every audit
+ * timed out). The caller throws on this → exit 1 → merge blocked. A timed-out route audit is
+ * recorded as a route error (perRoute[i].error set, via auditRouteWithRetry, D4), so a hung
+ * audit feeds this guard and can never become a silent pass. A PARTIAL failure (some routes ok)
+ * does NOT trip the guard: those errors are surfaced in the summary/annotations and the normal
+ * finding-based block decision applies.
+ *
+ * @param {Array<{ error?: string }>} perRoute
+ * @returns {boolean}
+ */
+export function allRoutesFailed(perRoute) {
+  return Array.isArray(perRoute) && perRoute.length > 0 && perRoute.every(r => r && r.error);
+}
+/**
+ * The CLI's audit-decision → process exit-code mapping (the CI merge gate). SAFETY-CRITICAL:
+ * exit 1 blocks the merge, exit 0 allows it. Returns 1 when the PR is blocked (findings at or
+ * above the block-on threshold — the baseline-aware decidePrBlock decision) OR the audit failed
+ * (every route errored / a startup/fetch error — both surface via the catch as `failed`); 0 only
+ * when a completed audit passed. Extracted as a pure exported fn (like allRoutesFailed) so the
+ * full block-decision → exit matrix is unit- + mutation-testable without driving Chrome.
+ *
+ * @param {{ blocked?: boolean, failed?: boolean }} [d]
+ * @returns {0 | 1}
+ */
+export function prExitCode({ blocked = false, failed = false } = {}) {
+  return (blocked || failed) ? 1 : 0;
+}
 // ── Route loader ──────────────────────────────────────────────────────────────
 async function loadRoutes() {
@@ -204,7 +260,6 @@ if (process.argv[1] === _thisFile) {
 async function main() {
   const prUrl     = process.env.ARGUS_PR_URL;
-  const targetUrl = process.env.TARGET_DEV_URL ?? 'http://localhost:3000';
   const blockOn   = (process.env.ARGUS_BLOCK_ON ?? 'critical').toLowerCase().trim();
   const token     = process.env.GITHUB_TOKEN;
@@ -218,16 +273,34 @@ async function main() {
     process.exit(1);
   }
+  // Resolve the audit target (D3 — deploy-preview auto-detection). Prefer a per-PR deploy
+  // preview over the static TARGET_DEV_URL, always degrading gracefully back to it. Default
+  // (no ARGUS_PREVIEW_URL / ARGUS_PREVIEW_DETECT) → TARGET_DEV_URL exactly, with NO extra
+  // network call — byte-identical to the prior behaviour. The head SHA (ARGUS_PR_HEAD_SHA,
+  // the PR head, not the runner's merge-commit GITHUB_SHA) keys the GitHub Deployments probe.
+  const headSha = process.env.ARGUS_PR_HEAD_SHA || process.env.GITHUB_SHA;
+  const { url: targetUrl, source: targetSource } = await resolveTargetUrl({
+    env: process.env, prUrl, headSha, token,
+  });
+  if (targetSource !== 'target-dev-url') {
+    console.log(`[argus] Audit target resolved via ${targetSource}: ${targetUrl}`);
+  }
   let mcp;
   const changedFiles   = [];
   const affectedRoutes = [];
   const allFindings    = [];
   const perRoute       = [];
+  const routeFindings  = [];   // [{ path, findings }] — feeds the baseline-aware diff (B1)
   try {
     // Step 1: Fetch the PR file list from GitHub
     console.log(`[argus] Fetching PR diff: ${prUrl}`);
-    const files = await fetchPrFiles(prUrl, token);
+    // prFiles carries { filename, status, patch } per file; `files` is the filename-only
+    // view used for slug mapping + the string[] output contract. prFiles.patch feeds the
+    // file:line annotations (Phase A3).
+    const prFiles = await fetchPrFiles(prUrl, token);
+    const files   = prFiles.map(f => f.filename);
     changedFiles.push(...files);
     console.log(`[argus] ${files.length} changed file(s)`);
     if (files.length >= 300) {
@@ -236,9 +309,14 @@ async function main() {
       console.log('::warning::PR has 300+ changed files — Argus analyzed the first 300. Routes affected by later files may be missed.');
     }
-    // Step 2: Map changed files to affected routes
+    // Step 2: Map changed files to affected routes. When ARGUS_SOURCE_DIR points at the
+    // checked-out app source, mapFilesToRoutesDeep (C1) maps a changed component to only the
+    // routes whose page files import it (static import graph + Next.js convention); without
+    // it — or on any resolution ambiguity — it falls back to the conservative slug heuristic
+    // (all routes on no-match), so a regression is never narrowed away. Opt-in: unset
+    // ARGUS_SOURCE_DIR keeps the prior slug-only behaviour exactly.
     const routes   = await loadRoutes();
-    const affected = mapFilesToRoutes(files, routes);
+    const affected = mapFilesToRoutesDeep(files, routes, { sourceDir: process.env.ARGUS_SOURCE_DIR });
     affectedRoutes.push(...affected);
     if (affected.length === 0) {
@@ -263,10 +341,34 @@ async function main() {
     mcp = await createMcpClient();
     console.log('[argus] Chrome connected.');
-    // Step 5: Audit each affected route via crawlRouteCheap
+    // INTENTIONAL DIVERGENCE from the MCP tool (src/mcp-server.js handlePrValidate): the CLI
+    // audits a routes-file (CI safety + speed); the MCP tool audits config/targets.js (dev
+    // convenience). That ROUTE-SOURCE divergence is by design (PR_VALIDATOR A4/E4). Everything
+    // downstream of the route list is SHARED so the two paths agree (E4 — CLI↔MCP parity):
+    //   • AUDIT DEPTH does NOT diverge — both run crawlRouteCheap by default and share ONE opt-in
+    //     depth policy (ARGUS_PR_AUDIT_DEPTH → selectAnalyzers, D2).
+    //   • The BLOCK DECISION is the shared decidePrBlock, fed a summary built by the shared
+    //     severityTally (Step 6 below) — for the same findings + baseline + blockOn the two paths
+    //     reach the IDENTICAL `blocked`/reason (decidePrBlock owns the none|warning|critical matrix
+    //     AND normalizes blockOn casing, so neither path re-implements or pre-normalizes the gate).
+    //   • Both paths REPORT through the SAME shared helper — reportPrValidation (Step 9 below).
+    //
+    // Step 5: Audit each affected route via crawlRouteWithDepth (cheap pass + the selected
+    // expensive analyzers, if any — see Step 5 depth resolution below).
     // Preserve path prefix (e.g. /project/ in GitHub Pages) — .origin would strip it
     const baseUrl = targetUrl.replace(/\/$/, '');
+    // Selective analyzer depth (D2). The shared policy maps ARGUS_PR_AUDIT_DEPTH + the PR's
+    // changed file types to the expensive analyzers to also run on each affected route.
+    // Default 'cheap' → empty list → crawlRouteWithDepth returns the crawlRouteCheap result
+    // unchanged (byte-identical to before). Computed once per PR (selection is per-PR, not
+    // per-route — every route gets the same depth).
+    const auditDepth     = resolveAuditDepth(process.env.ARGUS_PR_AUDIT_DEPTH);
+    const depthAnalyzers = selectAnalyzers({ depth: auditDepth, changedFiles: files });
+    if (depthAnalyzers.length > 0) {
+      console.log(`[argus] Audit depth: ${auditDepth} → also running expensive analyzers: ${depthAnalyzers.join(', ')}`);
+    }
     // Normalize route paths — crawlRouteCheap builds URLs via string concat (baseUrl + route.path)
     // so paths without a leading slash produce malformed URLs like https://example.comlogin
     const normalizedAffected = normalizeRoutePaths(affected).map((r, i) => {
@@ -276,63 +378,161 @@ async function main() {
       return r;
     });
-    for (const route of normalizedAffected) {
-      const url = `${baseUrl}${route.path}`;
-      console.log(`[argus] → Auditing ${url}`);
+    // Audit each affected route. Bounded-concurrency by ARGUS_CONCURRENCY (default 1 = sequential,
+    // byte-identical to the prior loop). Each parallel lane gets its OWN Chrome client —
+    // crawlRouteCheap mutates page-navigation state, so concurrent crawls must never share a
+    // connection — and auditRoutesConcurrently returns the per-route results in ROUTE order, so the
+    // aggregate findings + the baseline-aware block decision are identical to a sequential run (only
+    // wall-clock changes). Mirrors the orchestrator's parallel route crawling (D7.3).
+    const rawConcurrency = parseInt(process.env.ARGUS_CONCURRENCY ?? '1', 10);
+    const concurrency    = Math.min(10, Math.max(1, Number.isNaN(rawConcurrency) ? 1 : rawConcurrency));
+    if (concurrency > 1) {
+      console.log(`[argus] Parallel mode: concurrency=${concurrency} over ${normalizedAffected.length} route(s)`);
+    }
-      try {
-        const raw      = await crawlRouteCheap(route, baseUrl, mcp);
-        const findings = Array.isArray(raw.errors) ? raw.errors : [];
-        allFindings.push(...findings);
+    // Per-route timeout + retry (D4). Each route audit is bounded by ARGUS_ROUTE_TIMEOUT_MS
+    // (default 120000 ms) and optionally retried ARGUS_ROUTE_RETRIES times. A timed-out audit
+    // throws → it is recorded as a route ERROR below (ok:false), feeding the all-routes-failed
+    // guard — a hung audit can never silently pass. The bound only ever BLOCKS, never PASSES.
+    const { timeoutMs: routeTimeoutMs, retries: routeRetries } = routeResilienceFromEnv();
+    if (routeTimeoutMs > 0 || routeRetries > 0) {
+      console.log(`[argus] Per-route audit: timeout ${routeTimeoutMs > 0 ? `${routeTimeoutMs}ms` : 'off'}, ${routeRetries} retr${routeRetries === 1 ? 'y' : 'ies'}`);
+    }
-        const critical = findings.filter(f => f.severity === 'critical').length;
-        const warning  = findings.filter(f => f.severity === 'warning').length;
-        const info     = findings.filter(f => f.severity === 'info').length;
-        perRoute.push({ route: route.path, critical, warning, info });
+    const routeResults = await auditRoutesConcurrently(normalizedAffected, {
+      concurrency,
+      primaryClient: mcp,
+      createClient:  createMcpClient,
+      crawlRoute: async (route, client) => {
+        const url = `${baseUrl}${route.path}`;
+        console.log(`[argus] → Auditing ${url}`);
+        try {
+          const raw = await auditRouteWithRetry(
+            () => crawlRouteWithDepth(route, baseUrl, client, depthAnalyzers),
+            {
+              timeoutMs: routeTimeoutMs,
+              retries:   routeRetries,
+              label:     `Route audit ${route.path}`,
+              onRetry:   (attempt, err) =>
+                console.log(`::warning::Route ${route.path} audit attempt ${attempt} failed (${String(err.message).replace(/\n/g, ' ')}) — retrying`),
+            },
+          );
+          return { route, ok: true, raw };
+        } catch (routeErr) {
+          return { route, ok: false, error: routeErr.message };
+        }
+      },
+    });
-        console.log(`[argus]   ${url}: ${critical} critical, ${warning} warning, ${info} info`);
+    // Aggregate the ordered results sequentially — bookkeeping + annotations stay deterministic and
+    // identical to the prior sequential loop (results are in route order regardless of completion).
+    for (const item of routeResults) {
+      const route = item.route;
+      const url   = `${baseUrl}${route.path}`;
-        // Emit inline GitHub Actions annotations for visible CI feedback
-        for (const f of findings.filter(g => g.severity === 'critical')) {
-          console.log(`::error::${String(f.message ?? '').replace(/\n/g, ' ')} [${f.type}] on ${url}`);
-        }
-        for (const f of findings.filter(g => g.severity === 'warning')) {
-          console.log(`::warning::${String(f.message ?? '').replace(/\n/g, ' ')} [${f.type}] on ${url}`);
-        }
+      if (!item.ok) {
+        console.error(`::warning::Audit failed for ${url}: ${item.error}`);
+        perRoute.push({ route: route.path, critical: 0, warning: 0, info: 0, error: item.error });
+        continue;
+      }
-      } catch (routeErr) {
-        console.error(`::warning::Audit failed for ${url}: ${routeErr.message}`);
-        perRoute.push({ route: route.path, critical: 0, warning: 0, info: 0, error: routeErr.message });
+      const findings = Array.isArray(item.raw.errors) ? item.raw.errors : [];
+      allFindings.push(...findings);
+      routeFindings.push({ path: route.path, findings });
+      const critical = findings.filter(f => f.severity === 'critical').length;
+      const warning  = findings.filter(f => f.severity === 'warning').length;
+      const info     = findings.filter(f => f.severity === 'info').length;
+      perRoute.push({ route: route.path, critical, warning, info });
+      console.log(`[argus]   ${url}: ${critical} critical, ${warning} warning, ${info} info`);
+      // Emit inline GitHub Actions annotations for visible CI feedback (Phase A3).
+      // When a changed file SPECIFICALLY maps to this route and has a real added line in
+      // its patch, the annotation is anchored at file=path,line=N so it renders inline on
+      // the PR "Files changed" tab; otherwise it stays a route-level annotation. The line
+      // is never fabricated — resolveAnnotationTarget returns null unless the line is a
+      // genuine diff line (see pr-diff-analyzer.js).
+      const annTarget = resolveAnnotationTarget(route.path, prFiles);
+      const loc = annTarget ? ` file=${annTarget.path},line=${annTarget.line}` : '';
+      for (const f of findings.filter(g => g.severity === 'critical')) {
+        console.log(`::error${loc}::${String(f.message ?? '').replace(/\n/g, ' ')} [${f.type}] on ${url}`);
+      }
+      for (const f of findings.filter(g => g.severity === 'warning')) {
+        console.log(`::warning${loc}::${String(f.message ?? '').replace(/\n/g, ' ')} [${f.type}] on ${url}`);
       }
     }
-    // Guard: if every route failed with an exception, the app was unreachable after
-    // the preflight check (e.g. race condition where app died between check and crawl).
-    // Throwing here causes the step to exit 1, which correctly blocks the merge.
-    const routeFailCount = perRoute.filter(r => r.error).length;
-    if (routeFailCount > 0 && routeFailCount === perRoute.length) {
+    // Guard: if EVERY audited route errored, the app was unreachable after the preflight check
+    // (e.g. the app died between check and crawl) or every audit timed out (D4). Throwing here
+    // exits 1, which correctly blocks the merge — a hung/unreachable app never false-passes. The
+    // decision lives in the exported allRoutesFailed() so it is unit- + mutation-testable.
+    if (allRoutesFailed(perRoute)) {
       throw new Error(
-        `All ${perRoute.length} route audit(s) failed — Chrome could not reach the app. ` +
+        `All ${perRoute.length} route audit(s) failed — Chrome could not reach the app or every audit timed out. ` +
         `Ensure TARGET_DEV_URL is accessible throughout the job. ` +
         `First error: ${perRoute[0].error}`,
       );
     }
-    // Step 6: Compute aggregate summary and merge-block decision
-    const summary = {
-      critical: allFindings.filter(f => f.severity === 'critical').length,
-      warning:  allFindings.filter(f => f.severity === 'warning').length,
-      info:     allFindings.filter(f => f.severity === 'info').length,
-    };
+    // Step 6: Compute the aggregate (absolute) severity summary that feeds the block decision.
+    // Built via the shared severityTally so the CLI and the MCP tool (handlePrValidate) construct
+    // the decidePrBlock `summary` input IDENTICALLY — the two paths cannot diverge on the block
+    // decision for the same findings (PR_VALIDATOR plan E4 — CLI↔MCP parity).
+    const summary = severityTally(allFindings);
+    // Step 6a: Baseline-aware merge-block decision (Phase B1). Diff the PR-head findings
+    // against the stored base-branch baseline (restored via the actions/cache pattern, keyed
+    // on GITHUB_BASE_REF) and gate on the findings this PR INTRODUCES. Fail safe: when no
+    // baseline is resolvable the decision blocks on absolute counts (pre-B1 behaviour) and
+    // the step summary says so — it never silently passes a broken app. The block-on matrix
+    // (none|warning|critical) lives in the shared decidePrBlock, so the CLI and the MCP tool
+    // (handlePrValidate) cannot diverge on the block semantics.
+    const outputDir    = process.env.REPORT_OUTPUT_DIR || './reports';
+    const baselineFile = resolvePrBaselineFile({ outputDir });
+    const baseline     = baselineFile ? loadPrBaseline(baselineFile) : null;
+    const decision     = decidePrBlock({ routeFindings, summary, blockOn, baseline });
+    const blocked      = decision.blocked;
+    // B2: tag each finding new-vs-persisting off the same baseline, so the PR comment surfaces
+    // ONLY the findings this PR introduced (tags ride on the shared objects in result.findings).
+    tagFindingNovelty(routeFindings, baseline);
+    const baselineInfo = decision.baselineAvailable
+      ? {
+          available:   true,
+          newCritical: decision.newSummary.critical,
+          newWarning:  decision.newSummary.warning,
+          newInfo:     decision.newSummary.info,
+          persisting:  decision.persistingCount,
+          resolved:    decision.resolvedCount,
+        }
+      : { available: false, note: decision.note };
+    if (decision.baselineAvailable) {
+      console.log(`[argus] Baseline diff (${baselineFile}): ${baselineInfo.newCritical} critical / ${baselineInfo.newWarning} warning new, ${baselineInfo.persisting} persisting, ${baselineInfo.resolved} resolved`);
+    } else {
+      console.log(`[argus] ${decision.note}`);
+    }
-    const blocked =
-      blockOn === 'critical' ? summary.critical > 0 :
-      blockOn === 'warning'  ? summary.critical + summary.warning > 0 :
-      false;
+    // Step 6b: Optionally update this branch's baseline (ARGUS_UPDATE_BASELINE). A base-branch
+    // run uses this to populate the cache the PR runs diff against; default off → no write,
+    // no behaviour change.
+    if (/^(1|true|yes|on)$/i.test(process.env.ARGUS_UPDATE_BASELINE || '')) {
+      try {
+        const writeFile = resolvePrBaselineFile({ outputDir, baseRef: getCurrentBranch() });
+        if (writeFile) {
+          savePrBaseline(writeFile, routeFindings);
+          console.log(`[argus] Baseline updated: ${writeFile}`);
+        }
+      } catch (baseErr) {
+        console.error(`::warning::Argus baseline write failed: ${baseErr.message}`);
+      }
+    }
     // Step 7: Write GitHub Actions outputs and step summary
     writeGithubOutputs({ blocked, summary, affectedRoutes: normalizedAffected });
-    writeStepSummary(buildStepSummary({ blocked, summary, affectedRoutes: normalizedAffected, perRoute, findings: allFindings, changedFiles: files, blockOn }));
+    writeStepSummary(buildStepSummary({ blocked, summary, affectedRoutes: normalizedAffected, perRoute, findings: allFindings, changedFiles: files, blockOn, baseline: baselineInfo }));
     // Step 8: Emit JSON result to stdout for downstream pipeline steps
     const result = {
@@ -344,26 +544,45 @@ async function main() {
       summary,
       blocked,
       blockOn,
+      baseline: baselineInfo,
     };
     console.log(JSON.stringify(result, null, 2));
-    if (blocked) {
-      const blockReason = blockOn === 'warning'
-        ? `${summary.critical} critical + ${summary.warning} warning finding(s) found`
-        : `${summary.critical} critical finding(s) found`;
-      console.error(`::error::Argus PR Validator: ${blockReason}. Merge blocked (block-on=${blockOn}).`);
-      process.exit(1);
+    // Step 9: Post/update the Argus PR comment + Check Run (Phase A). Idempotent — updates
+    // the single marker-tagged comment in place; the Check Run conclusion maps to the block
+    // decision. Reporting is best-effort: a missing token or a GitHub API failure must never
+    // crash the run or change the merge decision (the exit code below is the real gate).
+    try {
+      const reported = await reportPrValidation(result, { prUrl });
+      if (reported.posted) {
+        console.log(`[argus] PR comment posted/updated.${reported.checked ? ' Check Run completed.' : ''}`);
+      } else {
+        console.log(`[argus] PR reporting skipped: ${reported.reason}`);
+      }
+    } catch (reportErr) {
+      console.error(`::warning::Argus PR reporting failed: ${reportErr.message}`);
     }
-    console.log(`[argus] ✓ Audit passed — ${summary.critical} critical, ${summary.warning} warning, ${summary.info} info.`);
-    process.exit(0);
+    // The block decision → exit code (the merge gate) goes through the single prExitCode
+    // mapping so the CLI and its tests can never disagree on what a decision exits with.
+    const exitCode = prExitCode({ blocked });
+    if (blocked) {
+      // decision.reason is non-null when blocked and reflects NEW counts when a baseline was
+      // available, ABSOLUTE counts otherwise (see decidePrBlock — "new" vs "total").
+      console.error(`::error::Argus PR Validator: ${decision.reason}. Merge blocked (block-on=${blockOn}).`);
+    } else {
+      console.log(`[argus] ✓ Audit passed — ${summary.critical} critical, ${summary.warning} warning, ${summary.info} info.`);
+    }
+    process.exit(exitCode);
   } catch (err) {
     const summary = { critical: 0, warning: 0, info: 0 };
     console.error(`::error::Argus PR validation failed: ${err.message}`);
     writeGithubOutputs({ blocked: false, summary, affectedRoutes: [] });
     writeStepSummary(buildStepSummary({ blocked: false, summary, affectedRoutes: [], perRoute: [], findings: [], changedFiles, blockOn, error: err.message }));
-    process.exit(1);
+    // A failed audit (all routes errored, or a startup/fetch error) is conservative: exit 1 =
+    // merge blocked, never a false PASS. Routed through prExitCode for one exit-code source.
+    process.exit(prExitCode({ failed: true }));
   } finally {
     if (mcp) {