PyPI - websec-validator - Versions diffs - 0.3.0__tar.gz → 0.4.1__tar.gz - Mend

websec-validator 0.3.0tar.gz → 0.4.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (74) hide show

{websec_validator-0.3.0/src/websec_validator.egg-info → websec_validator-0.4.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: websec-validator
-Version: 0.3.0
+Version: 0.4.1
 Summary: Local-first security recon that briefs your AI coding agent: facts + tailored probe scripts, code-in / artifacts-out. No LLM, no server, no running app.
 Author: Ricardo Accioly
 License: MIT
@@ -82,7 +82,7 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 > That's the whole user surface: **`run`** (plus the optional, advanced **`dynamic`** live-probing step below). `recon`/`proof`/`calibrate` exist for developing the tool itself and are hidden from `--help` — you never need them.
-## What it extracts (13 deterministic extractors, no LLM)
+## What it extracts (15 deterministic extractors, no LLM)
 | | Dimension | Notable output |
 |---|---|---|
@@ -91,13 +91,15 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 | auth | scheme + login surface + **insecure-default signing secrets** | multi-scheme; flags a hard-coded `JWT_SECRET \|\| 'dev-secret'` fallback (forgeable JWT) |
 | **authz** | access-control map | guard coverage + **write endpoints with no visible guard** + roles |
 | tenant | multi-tenancy key candidates | the BOLA boundary, by frequency |
-| **password_policy** | cross-route policy consistency | flags a route enforcing fewer character classes than the strongest sibling (policy drift) |
-| surface | 14 sink classes | 12 user-input-gated (SSRF/SQLi/traversal/SSTI/…) **+ var-arg SSRF + response-side error-disclosure** |
+| **password_policy** | cross-route consistency **+ reuse/history** | complexity drift across routes **+ a set-password path that hashes without a reuse check** |
+| surface | 14 sink classes **+ redirect-SSRF** | user-input-gated sinks + var-arg SSRF + error-disclosure **+ follows-redirects-without-per-hop-guard** |
+| **upload_security** | unrestricted upload + unsafe serve | deny-list-only, stored-name-from-filename, trust-client-MIME, accept-SVG, **serve without `nosniff`** |
 | schemas | data models + **privileged fields** | Pydantic/SQLAlchemy/Django/Prisma/Mongoose/TypeORM/Zod → `role`/`isAdmin`/`groupId` for mass-assignment targeting |
-| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, Dockerfile-root, tfstate **+ CDK AppSync `API_KEY` default-auth (CSWSH)** |
+| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, tfstate, **CDK AppSync `API_KEY` anonymous-default-auth + WAF-as-control smell** |
 | client_exposure | browser leakage | public-var secrets by **name + value-shape (`da2-…`) + CDK build-injection**, server-secret-in-client, source maps |
-| **client_integrity** | tamperable display (man-in-the-browser) | a fund-redirecting value (wallet address/QR) shown without a strict CSP / out-of-band anchor |
-| graphql | GraphQL surface | introspection / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA) + WAF-bypass-aware introspection** |
+| **client_integrity** | tamperable display + **WS auth model** | wallet value without strict CSP / out-of-band anchor **+ the CSWSH determinant (ambient-cookie WS auth)** |
+| **pii_exposure** | unmasked PII at the output boundary | `res.json(rawEntity)` with PII + **a masking control defined but with zero live call sites** (value-shape, not field-name) |
+| graphql | GraphQL surface | introspection (**AppSync `introspectionConfig: DISABLED`-aware**) / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA)** |
 | integrations | third-party + webhooks | webhooks missing signature verification |
 Plus **derived targeting** — IDOR / SSRF / open-redirect / upload / write / auth-endpoint
@@ -206,13 +208,15 @@ publisher** with project `websec-validator`, owner `raccioly`, repo `websec-vali
 ## Status / roadmap
-**Done:** 13-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
-managed-AppSync / VTL boundary** — CSWSH, cross-group subscription BOLA, forgeable-JWT default
-secrets — and a **man-in-the-browser / tamperable-display** class), cross-tool de-dup + **bundled
-Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger with **calibrated
-confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all scanners + Noir,
-arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA — validated live,
-reproduced a hand-pentest's 14/14).
+**Done:** 15-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
+managed-AppSync / VTL boundary**, **upload-security** + **PII-output-boundary** + **redirect-SSRF**
++ **password-reuse** classes, and a **man-in-the-browser / tamperable-display** class), cross-tool
+de-dup + **bundled Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger
+with **calibrated confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all
+scanners + Noir, arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA —
+validated live, reproduced a hand-pentest's 14/14). Validated against the **PTREQ0013000 pen test +
+retest** (incl. correcting two findings the retest disproved: AppSync introspection *is* disablable
+engine-level, and API_KEY-default is anonymous-auth, not CSWSH).
 **Next:** dynamic write-verb BOLA + JWT/auth probes + ZAP/Nuclei two-role diff (gated, they mutate),
 calibration on hand-labeled real repos (more representative base rate), ASVS index lookup, optional
 model-SDK adapters for no-agent fallback.

{websec_validator-0.3.0 → websec_validator-0.4.1}/README.md RENAMED Viewed

@@ -70,7 +70,7 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 > That's the whole user surface: **`run`** (plus the optional, advanced **`dynamic`** live-probing step below). `recon`/`proof`/`calibrate` exist for developing the tool itself and are hidden from `--help` — you never need them.
-## What it extracts (13 deterministic extractors, no LLM)
+## What it extracts (15 deterministic extractors, no LLM)
 | | Dimension | Notable output |
 |---|---|---|
@@ -79,13 +79,15 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 | auth | scheme + login surface + **insecure-default signing secrets** | multi-scheme; flags a hard-coded `JWT_SECRET \|\| 'dev-secret'` fallback (forgeable JWT) |
 | **authz** | access-control map | guard coverage + **write endpoints with no visible guard** + roles |
 | tenant | multi-tenancy key candidates | the BOLA boundary, by frequency |
-| **password_policy** | cross-route policy consistency | flags a route enforcing fewer character classes than the strongest sibling (policy drift) |
-| surface | 14 sink classes | 12 user-input-gated (SSRF/SQLi/traversal/SSTI/…) **+ var-arg SSRF + response-side error-disclosure** |
+| **password_policy** | cross-route consistency **+ reuse/history** | complexity drift across routes **+ a set-password path that hashes without a reuse check** |
+| surface | 14 sink classes **+ redirect-SSRF** | user-input-gated sinks + var-arg SSRF + error-disclosure **+ follows-redirects-without-per-hop-guard** |
+| **upload_security** | unrestricted upload + unsafe serve | deny-list-only, stored-name-from-filename, trust-client-MIME, accept-SVG, **serve without `nosniff`** |
 | schemas | data models + **privileged fields** | Pydantic/SQLAlchemy/Django/Prisma/Mongoose/TypeORM/Zod → `role`/`isAdmin`/`groupId` for mass-assignment targeting |
-| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, Dockerfile-root, tfstate **+ CDK AppSync `API_KEY` default-auth (CSWSH)** |
+| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, tfstate, **CDK AppSync `API_KEY` anonymous-default-auth + WAF-as-control smell** |
 | client_exposure | browser leakage | public-var secrets by **name + value-shape (`da2-…`) + CDK build-injection**, server-secret-in-client, source maps |
-| **client_integrity** | tamperable display (man-in-the-browser) | a fund-redirecting value (wallet address/QR) shown without a strict CSP / out-of-band anchor |
-| graphql | GraphQL surface | introspection / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA) + WAF-bypass-aware introspection** |
+| **client_integrity** | tamperable display + **WS auth model** | wallet value without strict CSP / out-of-band anchor **+ the CSWSH determinant (ambient-cookie WS auth)** |
+| **pii_exposure** | unmasked PII at the output boundary | `res.json(rawEntity)` with PII + **a masking control defined but with zero live call sites** (value-shape, not field-name) |
+| graphql | GraphQL surface | introspection (**AppSync `introspectionConfig: DISABLED`-aware**) / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA)** |
 | integrations | third-party + webhooks | webhooks missing signature verification |
 Plus **derived targeting** — IDOR / SSRF / open-redirect / upload / write / auth-endpoint
@@ -194,13 +196,15 @@ publisher** with project `websec-validator`, owner `raccioly`, repo `websec-vali
 ## Status / roadmap
-**Done:** 13-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
-managed-AppSync / VTL boundary** — CSWSH, cross-group subscription BOLA, forgeable-JWT default
-secrets — and a **man-in-the-browser / tamperable-display** class), cross-tool de-dup + **bundled
-Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger with **calibrated
-confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all scanners + Noir,
-arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA — validated live,
-reproduced a hand-pentest's 14/14).
+**Done:** 15-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
+managed-AppSync / VTL boundary**, **upload-security** + **PII-output-boundary** + **redirect-SSRF**
++ **password-reuse** classes, and a **man-in-the-browser / tamperable-display** class), cross-tool
+de-dup + **bundled Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger
+with **calibrated confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all
+scanners + Noir, arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA —
+validated live, reproduced a hand-pentest's 14/14). Validated against the **PTREQ0013000 pen test +
+retest** (incl. correcting two findings the retest disproved: AppSync introspection *is* disablable
+engine-level, and API_KEY-default is anonymous-auth, not CSWSH).
 **Next:** dynamic write-verb BOLA + JWT/auth probes + ZAP/Nuclei two-role diff (gated, they mutate),
 calibration on hand-labeled real repos (more representative base rate), ASVS index lookup, optional
 model-SDK adapters for no-agent fallback.

{websec_validator-0.3.0 → websec_validator-0.4.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "websec-validator"
-version = "0.3.0"
+version = "0.4.1"
 description = "Local-first security recon that briefs your AI coding agent: facts + tailored probe scripts, code-in / artifacts-out. No LLM, no server, no running app."
 readme = "README.md"
 requires-python = ">=3.11"

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/__init__.py RENAMED Viewed

@@ -1,11 +1,15 @@
 """websec-validator — local-first security recon that briefs an AI coding agent.
 The tool does the deterministic half (read the repo, run the scanners it finds,
-stage the probe library tailored to what it discovered) and emits three artifacts:
+stage the probe library tailored to what it discovered) and emits, per immutable run:
-  1. findings.json    — de-duplicated static scanner results
-  2. FACTS.json       — stack, routes, auth-model candidates, attack surface
-  3. AGENT-BRIEFING.md — marching orders + staged probe scripts for your AI agent
+  1. FACTS.json          — stack, routes, auth-model candidates, attack surface
+  2. findings.json       — de-duplicated static scanner results (when --scan)
+  3. findings-ledger.json — ranked, standards-cited, calibrated findings (recon + static + dynamic)
+  4. AGENT-BRIEFING.md   — marching orders + the per-attack-class targeting
+  5. REPORT.md           — the human-readable historical record
+  6. CONSTITUTION.md     — the app's security invariants as checkable Given/When/Then
+  7. probes/             — the probe library staged against THIS app's real surface
 It never calls an LLM, never runs a server, and never needs a running instance of
 the target app. Running the probes and applying fixes is the agent + human's job.

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/briefing.py RENAMED Viewed

@@ -69,6 +69,20 @@ def render(facts: dict, scanners: dict, scan_results: list, probe_manifest: list
         pp_line = f"looks consistent across {len(pp['password_blocks'])} validator block(s)"
     else:
         pp_line = "_no password validators detected_"
+    if ((pp.get("password_reuse") or {}).get("gap")):
+        pp_line += "  ·  ⚠ NO reuse/history control (#6)"
+    up = facts.get("upload_security", {})
+    up_findings = up.get("findings", [])
+    up_section = ("\n".join(f"- **{f.get('severity')}** {f.get('kind')} — `{f.get('file')}`" for f in up_findings[:20])
+                  if up_findings else
+                  ("_upload handler(s) present; allow-list + nosniff look ok — spot-check_" if up.get("upload_handlers")
+                   else "_no upload handlers detected_"))
+    pii = facts.get("pii_exposure", {})
+    pii_findings = pii.get("findings", [])
+    pii_section = ("\n".join(f"- **{f.get('severity')}** {f.get('kind')} — `{f.get('file')}`" for f in pii_findings[:20])
+                   if pii_findings else "_no obvious raw-PII responses / dead masking controls_")
+    ws_line = (facts.get("client_integrity", {}) or {}).get("websocket_auth", "no websocket detected")
     gql = facts.get("graphql", {})
     if gql.get("present"):
@@ -113,6 +127,12 @@ def render(facts: dict, scanners: dict, scan_results: list, probe_manifest: list
     endpoints = routes.get("endpoints", [])
     inventory = _bullets([f"`{e['method']:6}` {e['path']}" for e in endpoints], cap=80)
+    partial_banner = (
+        f"\n> ⚠️ **PARTIAL SCAN** — the walker stopped at the {facts.get('file_cap','?')}-file cap "
+        f"({facts.get('files_scanned','?')} files read, filesystem order), so recon may be INCOMPLETE on "
+        "this repo. Re-run scoped to a subdirectory or with `--exclude` to cover the rest before trusting "
+        "an absence of findings.\n" if facts.get("files_truncated") else "")
     return f"""# AGENT BRIEFING — security pass for `{facts.get('target','')}`
 > Generated by **websec-validator v{facts.get('version','')}** — deterministic recon, no LLM.
@@ -127,7 +147,7 @@ def render(facts: dict, scanners: dict, scan_results: list, probe_manifest: list
 ⚠️ Static findings + recon need **no running app**. The probes need a **live test instance + test
 credentials** — ask the human, never fabricate, never hit production.
+{partial_banner}
 ---
 ## 1. What this app is (detected)
@@ -181,6 +201,14 @@ Production source maps exposed: {client.get("production_source_maps", False)}
 **Client integrity — man-in-the-browser / tamperable display:**
 {ci_section}
+**WebSocket auth model (CSWSH determinant — is it an ambient cookie?):** {ws_line}
+**File-upload security (#2b — sniff bytes, derive stored name, nosniff on serve):**
+{up_section}
+**PII output boundary (#8 — verify by VALUE SHAPE, not field name):**
+{pii_section}
 **Third-party integrations:** {integ_line}
 {wh_line}

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/cli.py RENAMED Viewed

@@ -134,9 +134,11 @@ def cmd_run(args) -> int:
     # 5. briefing + comprehensive REPORT.md (immutable run record)
     (out / "AGENT-BRIEFING.md").write_text(briefing.render(facts, det, scan_results, manifest, unified))
     (out / "REPORT.md").write_text(report.render(facts, det, scan_results, unified, manifest, ts, ledger))
+    # drop the full `all` finding list from the manifest — it's a duplicate of findings.json
+    manifest_summary = {k: v for k, v in unified.items() if k != "all"} if unified else None
     (out / "manifest.json").write_text(json.dumps(
         {"facts": "FACTS.json", "scanners": det, "scan_results": scan_results,
-         "findings_summary": unified, "ledger": {"total": ledger["total"], "by_severity": ledger["by_severity"]},
+         "findings_summary": manifest_summary, "ledger": {"total": ledger["total"], "by_severity": ledger["by_severity"]},
          "probes": manifest, "timestamp": ts}, indent=2))
     print(f"\n✓ run {ts} saved (immutable — nothing overwritten):\n    {out}")
@@ -327,6 +329,9 @@ def _which(b):
 def _print_facts_summary(facts: dict) -> None:
+    if facts.get("files_truncated"):
+        print(f"  ⚠ PARTIAL SCAN — hit the {facts.get('file_cap', '?')}-file cap; recon may be incomplete. "
+              "Narrow with --exclude or scan a subdirectory.")
     st = facts.get("stack", {})
     rt = facts.get("routes", {})
     tg = rt.get("targeting", {})

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/dynamic.py RENAMED Viewed

@@ -106,7 +106,9 @@ def cross_tenant_bola(cfg: dict, facts: dict) -> dict:
     for path in endpoints:
         # attacker A tries to read B's tenant data, and vice-versa
         for atk, vic, direction in ((a, b, "A→B"), (b, a, "B→A")):
-            url = cfg["target"] + path.replace("{" + param + "}", vic["tenant"])
+            # str(): a tenant id is often numeric (auto-increment) — str.replace's 2nd arg must be a
+            # str, so a JSON int would crash this (uncaught) authenticated path.
+            url = cfg["target"] + path.replace("{" + param + "}", str(vic["tenant"]))
             code, body = _request("GET", url, atk["token"])
             if code in (401, 403, 404):
                 verdict = "blocked"
@@ -164,7 +166,9 @@ def unauth_reachability(target: str, facts: dict, max_endpoints: int = 50) -> di
         if e.get("method") != "GET" or "{" in p or SIDE_EFFECTING.search(p):
             continue
         eps.append(p)
-    eps = sorted(set(eps))[:max_endpoints]
+    _all_eps = sorted(set(eps))
+    eps = _all_eps[:max_endpoints]
+    over_cap = max(0, len(_all_eps) - max_endpoints)   # disclose, don't silently drop (a missed endpoint = a missed lead)
     results, skipped = [], [e.get("path") for e in (facts.get("routes") or {}).get("endpoints", [])
                             if e.get("method") == "GET" and SIDE_EFFECTING.search(e.get("path", ""))]
@@ -195,11 +199,13 @@ def unauth_reachability(target: str, facts: dict, max_endpoints: int = 50) -> di
         "skipped_side_effecting": sorted(set(skipped)),
         "open_no_auth": openish,
         "results": results,
+        "endpoints_over_cap": over_cap,
         "fail_open_suspected": fail_open,
         "authn_trustworthy": not fail_open,
         "warning": FAIL_OPEN_WARNING if fail_open else "",
         "summary": f"{len(openish)}/{len(results)} data-read GET endpoints reachable WITHOUT auth"
                    + (" — review whether these should be public" if openish else " — all gated")
+                   + (f"  ·  ⚠ {over_cap} more over the {max_endpoints}-endpoint cap NOT tested" if over_cap else "")
                    + ("  ·  ⚠ FAIL-OPEN SUSPECTED (nothing enforced auth — results untrustworthy)" if fail_open else ""),
     }
@@ -219,7 +225,9 @@ def write_auth_enforcement(target: str, facts: dict, max_endpoints: int = 80) ->
         p = e.get("path", "")
         if e.get("method") in WRITE_VERBS and not SIDE_EFFECTING.search(p):
             eps.append((e["method"], p))
-    eps = sorted(set(eps))[:max_endpoints]
+    _all_eps = sorted(set(eps))
+    eps = _all_eps[:max_endpoints]
+    over_cap = max(0, len(_all_eps) - max_endpoints)
     results = []
     for method, path in eps:
@@ -229,9 +237,14 @@ def write_auth_enforcement(target: str, facts: dict, max_endpoints: int = 80) ->
             verdict = "auth-enforced"
         elif code in (200, 201, 204):
             verdict = "EXECUTED-UNAUTH"
-        elif code in (400, 422, 404, 405, 409, 415, 500):
+        elif code in (400, 422, 404, 405, 409, 415):
             verdict = "no-auth-gate (reached handler/validation)"
         else:
+            # 500 (and any other code) is INCONCLUSIVE: a 500 may be the auth layer itself throwing,
+            # not the handler running unauthenticated — so it must NOT become a no-auth-gate verdict
+            # (which would escalate to a HIGH missing-auth finding AND poison the calibration oracle
+            # with a confirmed-real sample). Matches the forged-token engine, which also excludes 500
+            # from "reached handler".
             verdict = f"http-{code}"
         results.append({"method": method, "path": path, "status": code, "verdict": verdict})
@@ -248,11 +261,13 @@ def write_auth_enforcement(target: str, facts: dict, max_endpoints: int = 80) ->
         "no_auth_gate": missing,
         "executed_unauth": executed,
         "results": results,
+        "endpoints_over_cap": over_cap,
         "fail_open_suspected": fail_open,
         "authn_trustworthy": not fail_open,
         "warning": FAIL_OPEN_WARNING if fail_open else "",
         "summary": f"{enforced}/{len(results)} write endpoints enforce auth · "
                    f"{len(missing)} reached with no auth gate · {len(executed)} executed unauthenticated"
+                   + (f"  ·  ⚠ {over_cap} more over the {max_endpoints}-endpoint cap NOT tested" if over_cap else "")
                    + ("  ·  ⚠ FAIL-OPEN SUSPECTED — results untrustworthy" if fail_open else ""),
     }
@@ -299,7 +314,9 @@ def forged_token_bypass(target: str, facts: dict, cookie_names=None,
         targets += [(e.get("method"), e.get("path", "")) for e in (facts.get("routes") or {}).get("endpoints", [])
                     if e.get("method") in WRITE_VERBS and "{" not in e.get("path", "")
                     and not SIDE_EFFECTING.search(e.get("path", ""))]
-    targets = sorted(set(targets))[:max_endpoints]
+    _all_targets = sorted(set(targets))
+    targets = _all_targets[:max_endpoints]
+    over_cap = max(0, len(_all_targets) - max_endpoints)
     results, bypassed = [], []
     for method, path in targets:
@@ -335,9 +352,11 @@ def forged_token_bypass(target: str, facts: dict, cookie_names=None,
         "tested": len(results),
         "bypassed": bypassed,
         "results": results,
+        "endpoints_over_cap": over_cap,
         "summary": f"{len(bypassed)}/{len(results)} gated route(s) accepted a forged unsigned token"
                    + (" — ⚠ SIGNATURE NOT VERIFIED (CWE-347 auth bypass)" if bypassed
-                      else " — all rejected the forged token"),
+                      else " — all rejected the forged token")
+                   + (f"  ·  ⚠ {over_cap} more over the {max_endpoints}-endpoint cap NOT tested" if over_cap else ""),
     }

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/extractors/__init__.py RENAMED Viewed

@@ -11,18 +11,20 @@ from pathlib import Path
 from .auth import AuthExtractor
 from .authz import AuthzExtractor
-from .base import Extractor, RepoContext
+from .base import MAX_FILES, Extractor, RepoContext
 from .client_exposure import ClientExposureExtractor
 from .client_integrity import ClientIntegrityExtractor
 from .graphql import GraphQLExtractor
 from .iac_ci import IacCiExtractor
 from .integrations import IntegrationsExtractor
+from .pii_exposure import PiiExposureExtractor
 from .policy_consistency import PolicyConsistencyExtractor
 from .routes import RoutesExtractor
 from .schemas import SchemasExtractor
 from .stack import StackExtractor
 from .surface import SurfaceExtractor
 from .tenant import TenantExtractor
+from .upload_security import UploadSecurityExtractor
 # Order matters: stack first (others read facts['stack']); authz after routes
 # (reads facts['routes']).
@@ -34,10 +36,12 @@ REGISTRY: list[Extractor] = [
     TenantExtractor(),
     PolicyConsistencyExtractor(),
     SurfaceExtractor(),
+    UploadSecurityExtractor(),
     SchemasExtractor(),
     IacCiExtractor(),
     ClientExposureExtractor(),
     ClientIntegrityExtractor(),
+    PiiExposureExtractor(),
     GraphQLExtractor(),
     IntegrationsExtractor(),
 ]
@@ -51,6 +55,10 @@ def run_all(root: Path, version: str, excludes: list | None = None) -> dict:
         "version": version,
         "target": str(root.resolve()),
         "files_scanned": len(ctx.code_files),
+        # PARTIAL-scan guard: the walker stops at MAX_FILES (filesystem order), so on a very large
+        # monorepo recon may miss files. Surface it loudly rather than implying full coverage.
+        "files_truncated": bool(getattr(ctx, "truncated", False)),
+        "file_cap": MAX_FILES,
     }
     for ext in REGISTRY:
         try:

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/extractors/authz.py RENAMED Viewed

@@ -21,6 +21,12 @@ from .base import Extractor, RepoContext
 WRITE_VERBS = {"POST", "PUT", "PATCH", "DELETE"}
+# endpoint_guards feeds the missing-auth ledger (findings.build_ledger), so capping it low was a
+# silent coverage cliff: a big monorepo's unguarded write #401 never became a finding. Raised to
+# cover realistic monorepos; truncation beyond this is DISCLOSED (endpoint_guards_truncated), never
+# silent — mirrors constitution.py's "…and N more" pattern.
+_MAX_ENDPOINT_GUARDS = 5000
 GUARD = re.compile(
     r"requireAuth|requirePermission|requireRole|requireGroupAccess|isAuthenticated|"
     r"@login_required|@jwt_required|@permission_required|@roles_required|ensureAuth|"
@@ -181,7 +187,8 @@ class AuthzExtractor(Extractor):
             "roles_detected": sorted(r for r in roles if r),
             "guard_summary": {"with_visible_guard": protected,
                               "no_visible_guard": no_guard, "unknown": unknown},
-            "endpoint_guards": egs[:400],
+            "endpoint_guards": egs[:_MAX_ENDPOINT_GUARDS],
+            "endpoint_guards_truncated": max(0, len(egs) - _MAX_ENDPOINT_GUARDS),
             "write_endpoints_without_visible_guard": sorted(set(no_guard_writes))[:60],
             "unsafe_auth_decoders": unsafe_decoders[:30],
             "unverified_signature_routes": unverified_routes,

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/extractors/base.py RENAMED Viewed

@@ -31,6 +31,31 @@ MAX_FILES = 12000
 MAX_BYTES = 2_000_000
+def path_in_skip_dir(path: str, root: "Path | str | None" = None) -> bool:
+    """True if `path` lies under a SKIP_DIR segment, measured RELATIVE to the scan root.
+    Checking the ABSOLUTE path's segments is the bug-005/bug-066 trap: when the scanned repo
+    itself lives under a skip-named ancestor (e.g. `.claude/worktrees/<id>`, `vendor/`,
+    `target/`, `~/.cache`), a segment ABOVE the root matches and the WHOLE tree — every route,
+    every finding — is silently dropped. Noir + the static scanners emit ABSOLUTE paths, so any
+    traversal that post-filters their output MUST strip the root prefix first (the walker already
+    does, via relative_to). Fail OPEN (keep the item) when the path can't be made relative — a
+    silent drop is the dangerous direction for a security tool. `root=None` preserves the legacy
+    raw-segment behavior for already-relative inputs.
+    """
+    p = (path or "").replace("\\", "/")
+    if not p:
+        return False
+    if root is not None:
+        try:
+            p = Path(path).resolve().relative_to(Path(root).resolve()).as_posix()
+        except (ValueError, OSError):
+            if Path(p).is_absolute():
+                return False  # absolute but outside the root → don't risk a false drop
+            # else: already a root-relative path → check its segments as-is below
+    return any(part in SKIP_DIRS for part in p.split("/"))
 class RepoContext:
     """Walk the tree once; cache file text; serve cheap queries to every extractor."""
@@ -47,9 +72,11 @@ class RepoContext:
     def _walk(self) -> None:
         n = 0
+        self.truncated = False          # set when MAX_FILES is hit → recon is PARTIAL, surface it
         for p in self.root.rglob("*"):
             if n >= MAX_FILES:
-                break
+                self.truncated = True   # rglob order is filesystem-dependent → which files drop is
+                break                   # nondeterministic; the consumer MUST know coverage is partial
             # match SKIP_DIRS against parts RELATIVE to the scan root — otherwise a
             # repo located under e.g. ~/.cache or any dir named like a skip-dir would
             # have its whole tree skipped.

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/extractors/client_integrity.py RENAMED Viewed

@@ -48,6 +48,16 @@ OOB_ANCHOR = re.compile(
     r"|out[_-]of[_-]band|toChecksumAddress|getAddress\(|checksumAddress|\beip[_-]?55\b|verifyAddress"
     r"|address[_-]?verif|verif\w*[_-]?address|sendVerificationEmail|canonical[_-]?address", re.I)
+# WebSocket / realtime auth model — the CSWSH determinant (PTREQ0013000 #4). CSWSH is only
+# exploitable when the socket authenticates via an AMBIENT COOKIE the browser auto-attaches
+# cross-origin. A token placed in the connection payload / subprotocol and stored origin-scoped is
+# NOT exploitable (SOP blocks a cross-origin page from reading it). This lets us ANSWER a CSWSH
+# scanner flag instead of guessing — the retest pushed back on exactly this and won.
+WS_USAGE = re.compile(r"new\s+WebSocket\(|socket\.io|graphql-ws|subscriptions-transport-ws|appsync-realtime"
+                      r"|\bwss?://", re.I)
+WS_COOKIE_AUTH = re.compile(r"withCredentials\s*:\s*true|credentials\s*:\s*['\"]include['\"]"
+                            r"|document\.cookie[\s\S]{0,80}?(?:socket|ws\b|websocket)", re.I)
 class ClientIntegrityExtractor(Extractor):
     name = "client_integrity"
@@ -57,6 +67,7 @@ class ClientIntegrityExtractor(Extractor):
         sensitive, qr_files, clip_files = [], [], []
         csp_present = csp_self = csp_nonce = csp_unsafe = False
         oob = []
+        ws_usage = ws_cookie = False
         for _p, rel, text in ctx.iter_code():
             if SENSITIVE_VALUE.search(text):
                 if len(sensitive) < 30:
@@ -75,10 +86,15 @@ class ClientIntegrityExtractor(Extractor):
                     csp_unsafe = True
             if OOB_ANCHOR.search(text) and len(oob) < 20:
                 oob.append(rel)
+            if WS_USAGE.search(text):
+                ws_usage = True
+            if WS_COOKIE_AUTH.search(text):
+                ws_cookie = True
         # strict = a real `script-src 'self'` (+ a nonce / strict-dynamic) with NO unsafe-inline/eval
         strict_csp = bool(csp_present and csp_self and csp_nonce and not csp_unsafe)
         out_of_band = bool(oob)
+        ws_cookie_auth = bool(ws_usage and ws_cookie)   # the CSWSH determinant (ambient-cookie WS auth)
         findings = []
         present = bool(sensitive)
@@ -109,8 +125,24 @@ class ClientIntegrityExtractor(Extractor):
                               "cryptographically tamper-proof on the web — the goal is detectable, not "
                               "impossible (the limit that hardware wallets exist to solve)."})
+        # CSWSH is ONLY real when the WS auth is an ambient cookie (PTREQ0013000 #4). This lets us
+        # answer a CSWSH scanner flag instead of guessing — a bearer token in the payload is not it.
+        if ws_cookie_auth:
+            findings.append({
+                "severity": "MEDIUM", "confidence": "LOW", "attack_class": "cswsh",
+                "issue": "WebSocket authenticated via an ambient cookie (Cross-Site WebSocket Hijacking)",
+                "detail": "A WebSocket/realtime connection appears to authenticate via a cookie "
+                          "(withCredentials / credentials:'include'), which the browser auto-attaches "
+                          "cross-origin — so a page on any origin can open an authenticated socket (CSWSH, #4). "
+                          "Validate the Origin on the handshake, or move the credential into the connection "
+                          "payload / subprotocol and store it origin-scoped (not a cookie). If WS auth is "
+                          "already a token in the payload, CSWSH is NOT exploitable."})
         return {
             "sensitive_display": sorted(set(sensitive)),
+            "websocket_auth": ("cookie (CSWSH-exposed — validate Origin)" if ws_cookie_auth
+                               else "token-or-none (CSWSH not exploitable)" if ws_usage
+                               else "no websocket detected"),
             "qr_generation": sorted(set(qr_files)),
             "clipboard_copy": sorted(set(clip_files)),
             "strict_csp": strict_csp,

{websec_validator-0.3.0 → websec_validator-0.4.1}/src/websec_validator/extractors/graphql.py RENAMED Viewed

@@ -33,6 +33,11 @@ TENANT_ARG = re.compile(r"\b(\w+)\s*\(([^)]*\b(?:groupId|group_id|orgId|org_id|t
 # Identity-binding signals in a VTL resolver — the field is tied to the CALLER, not a free arg.
 VTL_AUTHZ = re.compile(r"\$ctx(?:tx)?\.identity|\$context\.identity|identity\.(?:sub|username|claims|resolverContext)"
                        r"|util\.unauthorized|\bgroupIds?\b[\s\S]{0,80}?\bcontains\b|#if\s*\(\s*!?\s*\$ctx\.identity")
+# Engine-level introspection disable on aws-cdk-lib appsync.GraphqlApi. The PTREQ0013000 RETEST
+# proved this IS available and un-bypassable (unlike a WAF string-match) — so a correctly-configured
+# AppSync API must NOT be flagged. This corrects the 0.3.0 false positive that always cried wolf.
+APPSYNC_INTROSPECTION_OFF = re.compile(r"introspectionConfig\s*:\s*[\w.]*\bDISABLED\b")
+APPSYNC_LIMITING = re.compile(r"\bqueryDepthLimit\b|\bresolverCountLimit\b")
 class GraphQLExtractor(Extractor):
@@ -47,10 +52,15 @@ class GraphQLExtractor(Extractor):
         introspection, playground, limiting, code_hit = "unknown", False, False, False
         appsync, aws_directives = False, False
+        appsync_introspection_off = appsync_limiting = False
         schema_texts = []          # (rel, text) for SDL files — parsed for Subscription authz
         for _p, rel, text in ctx.iter_code():
             if APPSYNC_MARK.search(text):
                 appsync = True
+            if APPSYNC_INTROSPECTION_OFF.search(text):
+                appsync_introspection_off = True
+            if APPSYNC_LIMITING.search(text):
+                appsync_limiting = True
             if rel.endswith((".graphql", ".gql")):
                 schema_texts.append((rel, text))
                 if AWS_AUTH_DIRECTIVE.search(text):
@@ -74,14 +84,20 @@ class GraphQLExtractor(Extractor):
         findings = []
         sub_authz = []
         if managed:
-            # AppSync exposes introspection and it is NOT disablable at the API layer (no Apollo-style
-            # `introspection:false`). The report's #2 proved the WAF that "blocks" it is bypassable.
-            findings.append({"severity": "MEDIUM", "issue": "AppSync GraphQL introspection reachable",
-                             "attack_class": "graphql",
-                             "detail": "AppSync exposes schema introspection; it can't be disabled at the API layer. "
-                                       "If a WAF blocks the keyword, that string-match is bypassable via Unicode-escape "
-                                       "/ junk-byte padding (PTREQ0013000 #2). Enforce field-level @aws_* auth + run the "
-                                       "appsync-introspection probe (it attempts the bypass) — don't rely on the WAF."})
+            # AppSync introspection CAN be disabled engine-level via
+            # `introspectionConfig: IntrospectionConfig.DISABLED` (aws-cdk-lib) — un-bypassable, unlike
+            # a WAF byte-match. Only flag when it is NOT disabled (retest correction to the 0.3.0 FP).
+            if not appsync_introspection_off:
+                findings.append({"severity": "MEDIUM", "issue": "AppSync GraphQL introspection not disabled engine-level",
+                                 "attack_class": "graphql",
+                                 "detail": "Set `introspectionConfig: appsync.IntrospectionConfig.DISABLED` so the engine "
+                                           "rejects __schema/__type regardless of encoding. A WAF byte-match on `__schema` "
+                                           "is NOT sufficient — bypassable via Unicode/JSON escapes and it only fronts one "
+                                           "endpoint (PTREQ0013000 #2). Run the appsync-introspection probe to confirm."})
+            if not (appsync_limiting or limiting):
+                findings.append({"severity": "LOW", "issue": "AppSync has no query depth / resolver-count limit",
+                                 "attack_class": "graphql",
+                                 "detail": "add `queryDepthLimit` + `resolverCountLimit` (alias / deep-query DoS guard)."})
             sub_authz = self._subscription_authz(ctx, schema_texts, findings)
         else:
             if introspection in ("enabled", "unknown"):
@@ -103,7 +119,8 @@ class GraphQLExtractor(Extractor):
                              or (["AppSync GraphQL API (HTTP + realtime WebSocket)"] if managed
                                  else ["(server detected; endpoint not routed by Noir)"]),
                 "schema_files": schema_files[:20],
-                "introspection": "appsync-reachable" if managed else introspection,
+                "introspection": (("appsync-disabled" if appsync_introspection_off else "appsync-reachable")
+                                  if managed else introspection),
                 "playground_enabled": playground, "query_limiting_detected": limiting,
                 "subscription_authz": sub_authz,
                 "findings": findings,

websec-validator 0.3.0__tar.gz → 0.4.1__tar.gz

websec-validator 0.3.0tar.gz → 0.4.1tar.gz