PyPI - websec-validator - Versions diffs - 0.2.8__tar.gz → 0.3.0__tar.gz - Mend

websec-validator 0.2.8tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (72) hide show

{websec_validator-0.2.8/src/websec_validator.egg-info → websec_validator-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: websec-validator
-Version: 0.2.8
+Version: 0.3.0
 Summary: Local-first security recon that briefs your AI coding agent: facts + tailored probe scripts, code-in / artifacts-out. No LLM, no server, no running app.
 Author: Ricardo Accioly
 License: MIT
@@ -82,20 +82,22 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 > That's the whole user surface: **`run`** (plus the optional, advanced **`dynamic`** live-probing step below). `recon`/`proof`/`calibrate` exist for developing the tool itself and are hidden from `--help` — you never need them.
-## What it extracts (11 deterministic extractors, no LLM)
+## What it extracts (13 deterministic extractors, no LLM)
 | | Dimension | Notable output |
 |---|---|---|
 | stack | languages, frameworks, datastores | monorepo-aware (aggregates every manifest) |
 | routes | every endpoint via **OWASP Noir** | method · path · typed params · code path |
-| auth | scheme + login surface | multi-scheme (primary jwt > passport), PyJWT/NextAuth/session aware |
+| auth | scheme + login surface + **insecure-default signing secrets** | multi-scheme; flags a hard-coded `JWT_SECRET \|\| 'dev-secret'` fallback (forgeable JWT) |
 | **authz** | access-control map | guard coverage + **write endpoints with no visible guard** + roles |
 | tenant | multi-tenancy key candidates | the BOLA boundary, by frequency |
-| surface | 12 user-input-gated sink classes | SSRF/SQLi/NoSQLi/traversal/SSTI/redirect/deser/XXE/proto-pollution/ReDoS/cmd/eval |
+| **password_policy** | cross-route policy consistency | flags a route enforcing fewer character classes than the strongest sibling (policy drift) |
+| surface | 14 sink classes | 12 user-input-gated (SSRF/SQLi/traversal/SSTI/…) **+ var-arg SSRF + response-side error-disclosure** |
 | schemas | data models + **privileged fields** | Pydantic/SQLAlchemy/Django/Prisma/Mongoose/TypeORM/Zod → `role`/`isAdmin`/`groupId` for mass-assignment targeting |
-| iac_ci | IaC + CI/CD | GitHub Actions injection, unpinned actions, Dockerfile-root, tfstate |
-| client_exposure | browser leakage | `NEXT_PUBLIC_*` secrets, server-secret-in-client, source maps |
-| graphql | GraphQL surface | introspection / playground / missing depth-limit |
+| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, Dockerfile-root, tfstate **+ CDK AppSync `API_KEY` default-auth (CSWSH)** |
+| client_exposure | browser leakage | public-var secrets by **name + value-shape (`da2-…`) + CDK build-injection**, server-secret-in-client, source maps |
+| **client_integrity** | tamperable display (man-in-the-browser) | a fund-redirecting value (wallet address/QR) shown without a strict CSP / out-of-band anchor |
+| graphql | GraphQL surface | introspection / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA) + WAF-bypass-aware introspection** |
 | integrations | third-party + webhooks | webhooks missing signature verification |
 Plus **derived targeting** — IDOR / SSRF / open-redirect / upload / write / auth-endpoint
@@ -204,11 +206,13 @@ publisher** with project `websec-validator`, owner `raccioly`, repo `websec-vali
 ## Status / roadmap
-**Done:** 11-extractor recon (incl. schema/entity → mass-assignment targeting), cross-tool de-dup,
-tailored probe staging, agent briefing, traceable findings ledger with **calibrated confidence
-(CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all scanners + Noir, arch-aware),
-**dynamic phase v1** (authenticated read-only cross-tenant BOLA — validated live, reproduced a
-hand-pentest's 14/14).
+**Done:** 13-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
+managed-AppSync / VTL boundary** — CSWSH, cross-group subscription BOLA, forgeable-JWT default
+secrets — and a **man-in-the-browser / tamperable-display** class), cross-tool de-dup + **bundled
+Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger with **calibrated
+confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all scanners + Noir,
+arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA — validated live,
+reproduced a hand-pentest's 14/14).
 **Next:** dynamic write-verb BOLA + JWT/auth probes + ZAP/Nuclei two-role diff (gated, they mutate),
 calibration on hand-labeled real repos (more representative base rate), ASVS index lookup, optional
 model-SDK adapters for no-agent fallback.

{websec_validator-0.2.8 → websec_validator-0.3.0}/README.md RENAMED Viewed

@@ -70,20 +70,22 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 > That's the whole user surface: **`run`** (plus the optional, advanced **`dynamic`** live-probing step below). `recon`/`proof`/`calibrate` exist for developing the tool itself and are hidden from `--help` — you never need them.
-## What it extracts (11 deterministic extractors, no LLM)
+## What it extracts (13 deterministic extractors, no LLM)
 | | Dimension | Notable output |
 |---|---|---|
 | stack | languages, frameworks, datastores | monorepo-aware (aggregates every manifest) |
 | routes | every endpoint via **OWASP Noir** | method · path · typed params · code path |
-| auth | scheme + login surface | multi-scheme (primary jwt > passport), PyJWT/NextAuth/session aware |
+| auth | scheme + login surface + **insecure-default signing secrets** | multi-scheme; flags a hard-coded `JWT_SECRET \|\| 'dev-secret'` fallback (forgeable JWT) |
 | **authz** | access-control map | guard coverage + **write endpoints with no visible guard** + roles |
 | tenant | multi-tenancy key candidates | the BOLA boundary, by frequency |
-| surface | 12 user-input-gated sink classes | SSRF/SQLi/NoSQLi/traversal/SSTI/redirect/deser/XXE/proto-pollution/ReDoS/cmd/eval |
+| **password_policy** | cross-route policy consistency | flags a route enforcing fewer character classes than the strongest sibling (policy drift) |
+| surface | 14 sink classes | 12 user-input-gated (SSRF/SQLi/traversal/SSTI/…) **+ var-arg SSRF + response-side error-disclosure** |
 | schemas | data models + **privileged fields** | Pydantic/SQLAlchemy/Django/Prisma/Mongoose/TypeORM/Zod → `role`/`isAdmin`/`groupId` for mass-assignment targeting |
-| iac_ci | IaC + CI/CD | GitHub Actions injection, unpinned actions, Dockerfile-root, tfstate |
-| client_exposure | browser leakage | `NEXT_PUBLIC_*` secrets, server-secret-in-client, source maps |
-| graphql | GraphQL surface | introspection / playground / missing depth-limit |
+| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, Dockerfile-root, tfstate **+ CDK AppSync `API_KEY` default-auth (CSWSH)** |
+| client_exposure | browser leakage | public-var secrets by **name + value-shape (`da2-…`) + CDK build-injection**, server-secret-in-client, source maps |
+| **client_integrity** | tamperable display (man-in-the-browser) | a fund-redirecting value (wallet address/QR) shown without a strict CSP / out-of-band anchor |
+| graphql | GraphQL surface | introspection / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA) + WAF-bypass-aware introspection** |
 | integrations | third-party + webhooks | webhooks missing signature verification |
 Plus **derived targeting** — IDOR / SSRF / open-redirect / upload / write / auth-endpoint
@@ -192,11 +194,13 @@ publisher** with project `websec-validator`, owner `raccioly`, repo `websec-vali
 ## Status / roadmap
-**Done:** 11-extractor recon (incl. schema/entity → mass-assignment targeting), cross-tool de-dup,
-tailored probe staging, agent briefing, traceable findings ledger with **calibrated confidence
-(CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all scanners + Noir, arch-aware),
-**dynamic phase v1** (authenticated read-only cross-tenant BOLA — validated live, reproduced a
-hand-pentest's 14/14).
+**Done:** 13-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
+managed-AppSync / VTL boundary** — CSWSH, cross-group subscription BOLA, forgeable-JWT default
+secrets — and a **man-in-the-browser / tamperable-display** class), cross-tool de-dup + **bundled
+Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger with **calibrated
+confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all scanners + Noir,
+arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA — validated live,
+reproduced a hand-pentest's 14/14).
 **Next:** dynamic write-verb BOLA + JWT/auth probes + ZAP/Nuclei two-role diff (gated, they mutate),
 calibration on hand-labeled real repos (more representative base rate), ASVS index lookup, optional
 model-SDK adapters for no-agent fallback.

{websec_validator-0.2.8 → websec_validator-0.3.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "websec-validator"
-version = "0.2.8"
+version = "0.3.0"
 description = "Local-first security recon that briefs your AI coding agent: facts + tailored probe scripts, code-in / artifacts-out. No LLM, no server, no running app."
 readme = "README.md"
 requires-python = ">=3.11"
@@ -25,4 +25,4 @@ package-dir = { "" = "src" }
 where = ["src"]
 [tool.setuptools.package-data]
-websec_validator = ["templates/probes/*", "templates/reports/*", "corpus.json", "calibration.json"]
+websec_validator = ["templates/probes/*", "templates/reports/*", "rules/*.yml", "corpus.json", "calibration.json"]

{websec_validator-0.2.8 → websec_validator-0.3.0}/src/websec_validator/briefing.py RENAMED Viewed

@@ -49,9 +49,27 @@ def render(facts: dict, scanners: dict, scan_results: list, probe_manifest: list
     iac_lines = "\n".join(f"- **{f['severity']}** `{f['kind']}` — `{f['file']}` — {f['detail']}"
                           for f in iac_findings[:20]) or "_none_"
     client = facts.get("client_exposure", {})
-    client_leaks = client.get("public_secret_leaks", []) + client.get("server_secret_in_client_component", [])
+    client_leaks = (client.get("public_secret_leaks", []) + client.get("server_secret_in_client_component", [])
+                    + client.get("public_secret_value_leaks", []) + client.get("public_var_from_cfn_output", []))
     client_section = _bullets(client_leaks) if client_leaks else "_none detected_"
+    ci = facts.get("client_integrity", {})
+    ci_findings = ci.get("findings", [])
+    ci_section = ("\n".join(f"- **{f.get('severity')}/{f.get('confidence','LOW')}** {f.get('issue')}"
+                            for f in ci_findings) if ci_findings
+                  else "_no fund-redirecting display values detected (MITB class N/A)_" if not ci.get("sensitive_display")
+                  else "_sensitive display present; strict CSP + out-of-band anchor look present — spot-check_")
+    pp = facts.get("password_policy", {})
+    if pp.get("drift"):
+        pp_line = f"⚠ DRIFT — {len(pp['drift'])} sibling route(s) weaker than the strongest set {pp.get('strongest_policy')}"
+    elif pp.get("weak_policy"):
+        pp_line = f"uniform but WEAK — enforces only {pp.get('weak_policy')}"
+    elif pp.get("password_blocks"):
+        pp_line = f"looks consistent across {len(pp['password_blocks'])} validator block(s)"
+    else:
+        pp_line = "_no password validators detected_"
     gql = facts.get("graphql", {})
     if gql.get("present"):
         gfind = "; ".join(f"{x['severity']} {x['issue']}" for x in gql.get("findings", [])) or "no obvious issues"
@@ -158,6 +176,11 @@ Production source maps exposed: {client.get("production_source_maps", False)}
 **GraphQL surface:** {gql_line}
+**Password policy (cross-route consistency):** {pp_line}
+**Client integrity — man-in-the-browser / tamperable display:**
+{ci_section}
 **Third-party integrations:** {integ_line}
 {wh_line}

{websec_validator-0.2.8 → websec_validator-0.3.0}/src/websec_validator/constitution.py RENAMED Viewed

@@ -60,6 +60,22 @@ def build(facts: dict, ledger: dict | None = None) -> list:
     add("Secret hygiene", "Given the repo + git history, Then no live credential is present and no secret "
         "reaches the client bundle", "recon")
+    # P6 — Signing-secret integrity (forgeable JWT, PTREQ0013000 #8)
+    for sd in ((facts.get("auth", {}) or {}).get("insecure_secret_defaults", []) or [])[:5]:
+        add("Signing-secret integrity", f"Given the signing-secret env var is unset, When the app boots, Then it "
+            f"FAILS CLOSED — no hard-coded fallback ({sd.get('literal')!r} in {sd.get('file')})",
+            sd.get("file", "recon"))
+    # P7 — Subscription authorization (cross-group BOLA, #5)
+    for s in ((facts.get("graphql", {}) or {}).get("subscription_authz", []) or [])[:6]:
+        add("Subscription authorization", f"Given a tenant id you do NOT own, When subscribing to `{s.get('field')}`, "
+            f"Then the server rejects it (binds the tenant arg to your identity)", "recon")
+    # P8 — Display integrity (man-in-the-browser, the agent-wallet class)
+    if (facts.get("client_integrity", {}) or {}).get("sensitive_display"):
+        add("Display integrity", "Given a fund-redirecting value is displayed, Then a strict CSP kills the scalable "
+            "tamper vector AND an out-of-band anchor makes single-surface tampering user-detectable", "recon")
     return inv

{websec_validator-0.2.8 → websec_validator-0.3.0}/src/websec_validator/extractors/__init__.py RENAMED Viewed

@@ -13,9 +13,11 @@ from .auth import AuthExtractor
 from .authz import AuthzExtractor
 from .base import Extractor, RepoContext
 from .client_exposure import ClientExposureExtractor
+from .client_integrity import ClientIntegrityExtractor
 from .graphql import GraphQLExtractor
 from .iac_ci import IacCiExtractor
 from .integrations import IntegrationsExtractor
+from .policy_consistency import PolicyConsistencyExtractor
 from .routes import RoutesExtractor
 from .schemas import SchemasExtractor
 from .stack import StackExtractor
@@ -30,10 +32,12 @@ REGISTRY: list[Extractor] = [
     AuthExtractor(),
     AuthzExtractor(),
     TenantExtractor(),
+    PolicyConsistencyExtractor(),
     SurfaceExtractor(),
     SchemasExtractor(),
     IacCiExtractor(),
     ClientExposureExtractor(),
+    ClientIntegrityExtractor(),
     GraphQLExtractor(),
     IntegrationsExtractor(),
 ]

{websec_validator-0.2.8 → websec_validator-0.3.0}/src/websec_validator/extractors/auth.py RENAMED Viewed

@@ -27,6 +27,31 @@ COOKIE_READ = re.compile(
 _COOKIE_RESERVED = {"get", "set", "getall", "has", "delete", "clear", "tostring",
                     "foreach", "entries", "keys", "values", "size", "name", "value", "length"}
+# Insecure DEFAULT signing secret — a hard-coded fallback on a secret/key var (the forgeable-JWT
+# class, PTREQ0013000 #8). JS/TS: `process.env.JWT_SECRET || 'dev-secret-do-not-use-in-prod'`;
+# Python: os.environ.get('JWT_SECRET', 'dev-secret'). A quoted fallback on a *SECRET/*KEY var is
+# almost never benign — and if it's a dev-ish placeholder AND the repo actually signs JWTs, anyone
+# who reads the source can forge tokens for any user/role.
+_SECRET_VAR = (r"(?:JWT[_-]?SECRET|TOKEN[_-]?SECRET|REFRESH[_-]?SECRET|SIGNING[_-]?KEY"
+               r"|SESSION[_-]?SECRET|COOKIE[_-]?SECRET|AUTH[_-]?SECRET|APP[_-]?SECRET"
+               r"|HMAC[_-]?KEY|PRIVATE[_-]?KEY|SECRET[_-]?KEY|SECRET)")
+SECRET_DEFAULT_JS = re.compile(
+    _SECRET_VAR + r"['\"\]\s]*\s*(?:\|\||\?\?)\s*[`'\"]([^`'\"]{3,80})[`'\"]", re.I)
+SECRET_DEFAULT_PY = re.compile(
+    r"(?:os\.environ\.get|os\.getenv|getenv)\(\s*['\"][^'\"]*" + _SECRET_VAR
+    + r"[^'\"]*['\"]\s*,\s*['\"]([^'\"]{3,80})['\"]", re.I)
+# placeholder markers that make a fallback unambiguously a non-production dev secret
+SECRET_DEVISH = re.compile(r"dev|do[_-]?not[_-]?use|change[_-]?(?:me|it|this)|placeholder|secret|test"
+                           r"|local|example|sample|default|your[_-]|xxx|todo|fixme|123456|password", re.I)
+JWT_SIGN_VERIFY = re.compile(r"jwt\.(?:sign|verify)|jsonwebtoken|\bjose\b|jwtVerify|SignJWT|jwt\.encode", re.I)
+def _looks_like_example(rel: str) -> bool:
+    """Example/doc files are MEANT to hold placeholder secrets — don't cry forgeable-JWT on them."""
+    r = rel.lower()
+    return (".example" in r or ".sample" in r or ".dist" in r or ".template" in r
+            or "/docs/" in r or "/doc/" in r or "/examples/" in r or r.endswith((".md", ".mdx")))
 class AuthExtractor(Extractor):
     name = "auth"
@@ -41,6 +66,8 @@ class AuthExtractor(Extractor):
         jwt = passport = session = apikey = 0
         guard_files = []
         cookie_names: list[str] = []
+        secret_defaults: list = []          # (file, literal) hard-coded fallback signing secrets
+        jwt_sign_verify = False             # does the repo actually sign/verify JWTs?
         for _p, rel, text in ctx.iter_code():
             if JWT_LIBS.search(text):
                 jwt += 1
@@ -57,12 +84,34 @@ class AuthExtractor(Extractor):
                     name = m.group(1) or m.group(2) or m.group(3)
                     if name and name.lower() not in _COOKIE_RESERVED and name not in cookie_names:
                         cookie_names.append(name)
+            if JWT_SIGN_VERIFY.search(text):
+                jwt_sign_verify = True
+            if not _looks_like_example(rel):
+                for mm in SECRET_DEFAULT_JS.finditer(text):
+                    secret_defaults.append((rel, mm.group(1)))
+                for mm in SECRET_DEFAULT_PY.finditer(text):
+                    secret_defaults.append((rel, mm.group(1)))
+        # Hard-coded fallback signing secret → forgeable-JWT lead (PTREQ0013000 #8). De-dup by
+        # (file, literal); mark dev-ish placeholders. findings.py escalates dev-ish + jwt-in-use to
+        # CRITICAL; probes.stage seeds the literal into the hs256 brute-force candidate list.
+        seen_sd: set = set()
+        insecure_secret_defaults: list = []
+        for rel_, lit in secret_defaults:
+            if (rel_, lit) in seen_sd:
+                continue
+            seen_sd.add((rel_, lit))
+            insecure_secret_defaults.append({"file": rel_, "literal": lit,
+                                             "dev_ish": bool(SECRET_DEVISH.search(lit))})
+            if len(insecure_secret_defaults) >= 20:
+                break
         nextauth = "nextauth" in frameworks or any("nextauth" in e.lower() for e in auth_eps)
         # Detect ALL schemes present, then pick a primary by priority. A JWT app
         # that also wires Passport for SSO must read as primary=jwt, not passport
         # (Passport is often SSO-only). Priority: nextauth > jwt > session > passport > api-key.
+        route_count = len(routes.get("endpoints", []))
         detected = []
         if nextauth:
             detected.append("nextauth (session JWT in cookie)")
@@ -88,6 +137,13 @@ class AuthExtractor(Extractor):
             "cookie_names": cookie_names[:15],
             "guard_files": guard_files,
             "signal_counts": {"jwt": jwt, "passport": passport, "session": session, "api_key": apikey},
-            "note": "AGENT: confirm the PRIMARY auth flow + how a test token is minted before the JWT/auth "
-                    "probes. Multiple schemes often mean primary bearer/session + secondary SSO (passport).",
+            "insecure_secret_defaults": insecure_secret_defaults,   # CRITICAL-class (forgeable JWT #8)
+            "jwt_sign_verify_present": jwt_sign_verify,
+            "route_count": route_count,
+            "reliable_signal": route_count > 0 or bool(nextauth),
+            "note": (("⚠ No HTTP routes detected — this auth scheme is LOW-CONFIDENCE (likely a "
+                      "library/CLI/scanner that merely mentions auth, or routes weren't parsed). "
+                      if not (route_count > 0 or nextauth) else "")
+                     + "AGENT: confirm the PRIMARY auth flow + how a test token is minted before the "
+                     "JWT/auth probes. Multiple schemes often mean primary bearer/session + secondary SSO."),
         }

{websec_validator-0.2.8 → websec_validator-0.3.0}/src/websec_validator/extractors/authz.py RENAMED Viewed

@@ -142,6 +142,28 @@ class AuthzExtractor(Extractor):
                 for dec in sorted(set(UNSAFE_DECODER.findall(text))):
                     unsafe_decoders.append({"file": rel, "decoder": dec})
+        # A guard DEFINED in a file that also calls an unsafe/unverified decoder authenticates via
+        # an unverified decode. Routes that call such a guard are the static "at-risk" set for the
+        # forged-token bypass class — the dynamic probe confirms which actually fall, but this points
+        # at them even with NO live target (turns the F5 hypothesis into named routes).
+        unverified_routes: list = []
+        unsafe_files = {ud["file"] for ud in unsafe_decoders}
+        if unsafe_files:
+            guard_def = re.compile(r"(?:export\s+)?(?:async\s+)?(?:function|const)\s+"
+                                   r"(require\w+|ensure\w+|\w*[Aa]uth\w*|verify\w+)\b")
+            unsafe_guards = set()
+            for _p, rel, text in ctx.iter_code():
+                if rel in unsafe_files:
+                    unsafe_guards.update(g for g in guard_def.findall(text) if len(g) >= 5)
+            if unsafe_guards:
+                call = re.compile(r"\b(?:" + "|".join(re.escape(g) for g in sorted(unsafe_guards)) + r")\s*\(")
+                for e in endpoints:
+                    cp = e.get("code_path", "")
+                    t = ctx.text(Path(cp)) if cp else ""
+                    if t and call.search(t):
+                        unverified_routes.append(f"{e.get('method')} {e.get('path')}")
+            unverified_routes = sorted(set(unverified_routes))[:60]
         if global_auth:
             where = f"`{mw['file']}` (matcher {mw.get('matchers') or '—'})" if mw_auth else "`app.use(<auth>)`"
             note = (f"A GLOBAL auth middleware ({where}) was detected — most routes are protected by default. "
@@ -162,5 +184,6 @@ class AuthzExtractor(Extractor):
             "endpoint_guards": egs[:400],
             "write_endpoints_without_visible_guard": sorted(set(no_guard_writes))[:60],
             "unsafe_auth_decoders": unsafe_decoders[:30],
+            "unverified_signature_routes": unverified_routes,
             "note": note,
         }

{websec_validator-0.2.8 → websec_validator-0.3.0}/src/websec_validator/extractors/base.py RENAMED Viewed

@@ -20,7 +20,13 @@ SKIP_DIRS = {".git", "node_modules", "dist", "build", ".next", ".nuxt", "venv",
              # agent tooling + editor dirs + worktree copies — not the target app
              ".wolf", ".claude", ".worktrees", ".idea", ".vscode", ".agent", ".agents"}
 CODE_EXT = {".js", ".jsx", ".ts", ".tsx", ".mjs", ".cjs", ".py", ".go", ".rb",
-            ".java", ".php", ".prisma"}
+            ".java", ".php", ".prisma",
+            # Managed-cloud surfaces: AppSync GraphQL SDL (@aws_* auth directives) + VTL
+            # resolvers (where realtime/subscription authz actually lives, or is missing).
+            # PTREQ0013000 #2/#5 lived in these file types — previously invisible to every
+            # iter_code()-based extractor. routes.py SPEC_PATH still splits .graphql/.gql out
+            # of the route list so SDL doesn't generate phantom endpoints.
+            ".graphql", ".gql", ".vtl"}
 MAX_FILES = 12000
 MAX_BYTES = 2_000_000

websec_validator-0.3.0/src/websec_validator/extractors/client_exposure.py ADDED Viewed

@@ -0,0 +1,81 @@
+"""Client-side exposure extractor — secrets that leak into the browser bundle.
+The Next.js/Vite footgun: any `NEXT_PUBLIC_*` / `VITE_*` var is inlined into the
+client bundle, and a server-only secret referenced from a client component ships
+to every visitor. Cheap static scan, high signal.
+"""
+from __future__ import annotations
+import re
+from .base import Extractor, RepoContext
+PUBLIC_ENV = re.compile(r"\b(NEXT_PUBLIC_\w+|VITE_\w+|REACT_APP_\w+|GATSBY_\w+|EXPO_PUBLIC_\w+|PUBLIC_\w{2,})\b")
+SECRETISH = re.compile(r"SECRET|PRIVATE|TOKEN|PASSWORD|PASSWD|API_?KEY|ACCESS_?KEY|CLIENT_SECRET|CREDENTIAL", re.I)
+SERVER_SECRET = re.compile(r"process\.env\.([A-Z0-9_]*(?:SECRET|PRIVATE|TOKEN|PASSWORD|API_?KEY|ACCESS_?KEY)[A-Z0-9_]*)")
+# VALUE-aware leak detection — hardens the name-based scan above so it survives a benign rename
+# (the PTREQ0013000 #3 gap: a real key carried in a non-secret-named public var slips the name scan).
+# We match distinctive secret SHAPES, not var names. AppSync's `da2-` key has NO scanner rule at all,
+# so we always flag it; the generic shapes (which trivy/gitleaks already catch) are only flagged when
+# the file is client-reachable, to add the ships-to-browser angle without duplicating those scanners.
+SECRET_SHAPES = [
+    (re.compile(r"\bda2-[a-z0-9]{26}\b"), "AWS AppSync API key (da2-…)", True),
+    (re.compile(r"\bAKIA[0-9A-Z]{16}\b"), "AWS access key id (AKIA)", False),
+    (re.compile(r"\bAIza[0-9A-Za-z_\-]{35}\b"), "Google API key (AIza…)", False),
+    (re.compile(r"\bsk_live_[0-9A-Za-z]{16,}\b"), "Stripe live secret key (sk_live_…)", False),
+    (re.compile(r"\beyJ[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{8,}\.[A-Za-z0-9_-]{4,}\b"), "JWT (eyJ…)", False),
+]
+# CDK build-time injection: a CloudFormation output / SSM param / Secret wired INTO a public build
+# var — e.g. CodeBuild `envFromCfnOutputs: { VITE_APPSYNC_API_KEY: appsyncApiKeyOutput }`. Invisible
+# to every secret scanner because the value isn't in source; it's injected at build time (the exact
+# mechanism that shipped the AppSync key to the browser in PTREQ0013000 #3).
+CFN_TO_PUBLIC = re.compile(
+    r"(?:envFromCfnOutputs|buildEnvironment|environmentVariables|partialBuildSpec)"
+    r"[\s\S]{0,400}?((?:NEXT_PUBLIC_|VITE_|REACT_APP_|GATSBY_|EXPO_PUBLIC_)\w*)\s*[:=]\s*"
+    r"(\w+Output\b|[\w.]+\.value\b|CfnOutput|StringParameter|(?:Fn\.)?importValue|Secret\b)", re.I)
+class ClientExposureExtractor(Extractor):
+    name = "client_exposure"
+    category = "exposure"
+    def extract(self, ctx: RepoContext, facts: dict) -> dict:
+        public_vars: set = set()
+        public_secret_leaks = []      # public-prefixed AND secret-named → ships to client
+        server_secret_in_client = []  # server secret referenced from a 'use client' file
+        public_value_leaks = []       # secret-SHAPE literal in client-reachable code (rename-proof, #3)
+        public_var_from_cfn = []      # CDK output/secret injected into a public build var (#3)
+        for _p, rel, text in ctx.iter_code():
+            for v in PUBLIC_ENV.findall(text):
+                public_vars.add(v)
+                if SECRETISH.search(v):
+                    public_secret_leaks.append(f"{v}  ({rel})")
+            if "use client" in text[:200] or "'use client'" in text[:200] or '"use client"' in text[:200]:
+                for s in SERVER_SECRET.findall(text):
+                    server_secret_in_client.append(f"{s}  ({rel})")
+            client_reachable = bool(PUBLIC_ENV.search(text)) or "use client" in text[:400]
+            for rx, label, always in SECRET_SHAPES:
+                if (always or client_reachable) and rx.search(text):
+                    public_value_leaks.append(f"{label}  ({rel})")
+            for m in CFN_TO_PUBLIC.finditer(text):
+                public_var_from_cfn.append(f"{m.group(1)} ← {m.group(2)}  ({rel})")
+        nextcfg = (ctx.manifest("next.config.js") + ctx.manifest("next.config.mjs")
+                   + ctx.manifest("next.config.ts"))
+        sourcemaps = "productionBrowserSourceMaps: true" in nextcfg
+        return {
+            "public_env_vars": sorted(public_vars)[:40],
+            "public_secret_leaks": sorted(set(public_secret_leaks)),     # HIGH if non-empty
+            "server_secret_in_client_component": sorted(set(server_secret_in_client)),  # HIGH if non-empty
+            "public_secret_value_leaks": sorted(set(public_value_leaks)),   # HIGH — value-detected, rename-proof
+            "public_var_from_cfn_output": sorted(set(public_var_from_cfn)),  # HIGH — CDK build-injected to client
+            "production_source_maps": sourcemaps,
+            "note": "public_secret_leaks / server_secret_in_client_component / public_secret_value_leaks / "
+                    "public_var_from_cfn_output ship secrets to the browser — treat as HIGH and confirm. "
+                    "Value/CFN-injection detection survives a benign var rename (the #3 gap). Plain "
+                    "NEXT_PUBLIC_* without a secret name/value/CFN-wire are usually fine.",
+        }

websec_validator-0.3.0/src/websec_validator/extractors/client_integrity.py ADDED Viewed

@@ -0,0 +1,126 @@
+"""Client-integrity / tamperable-display extractor — the man-in-the-browser (MITB) class.
+This is the agent-wallet lesson, generalized. When an app renders a **security-critical value**
+whose tampering redirects money — a wallet/receive address, a payment routing/account number, the
+QR that encodes it — that on-screen value is rewritable by code running in the victim's own browser
+(malware, a rogue extension, or a poisoned JS dependency in the app's own bundle). No web app can
+make on-screen display cryptographically tamper-proof; that's an inherent limit of the platform
+(it's why hardware wallets exist), accepted by Coinbase/MetaMask/banks alike.
+So this is deliberately a **LOW-confidence, architectural** flag, not a deterministic vuln. It can't
+prove tampering is possible; it checks whether the two controls that actually move the needle are
+present — and says so honestly:
+  Layer A (kill the SCALABLE vector): a strict Content-Security-Policy (`script-src 'self'` + a
+           nonce, no `unsafe-inline` / `unsafe-eval`) so an injected / supply-chain script can't run.
+  Layer B (anchor trust OFF the browser surface): an out-of-band verification path — emailed
+           canonical address, a short safety code / fingerprint, a server-rendered identicon, an
+           EIP-55 checksum — so a single-surface tamper is at least *detectable* by the user.
+A sensitive-display app missing A and/or B gets a flag pointing at exactly those layers. This is
+NOT a "your app is broken" claim — it's a "verify these compensating controls" lead for the agent.
+"""
+from __future__ import annotations
+import re
+from .base import Extractor, RepoContext
+# A value whose on-screen tampering redirects funds (the gate — financial/address-class signal).
+SENSITIVE_VALUE = re.compile(
+    r"\b(?:wallet|receive|receiving|deposit|recipient|payout|beneficiary|payment|destination)[_-]?address\b"
+    r"|\bwalletAddress\b|\btoAddress\b|\bpayTo\b|\brouting[_-]?number\b|\baccount[_-]?number\b|\biban\b"
+    r"|\b0x[0-9a-fA-F]{40}\b|crypto.{0,12}address|blockchain.{0,12}address", re.I)
+QR_SIGNAL = re.compile(r"\bqrcode\b|\bQRCode\b|react-qr|qrcode\.react|qr-code|toDataURL\(", re.I)
+CLIPBOARD = re.compile(r"navigator\.clipboard|clipboard\.writeText|copyToClipboard|useCopyToClipboard|writeText\(")
+CLIENT_MARKER = re.compile(r"['\"]use client['\"]|from\s+['\"]react|next/|\.tsx['\"]?|document\.|window\.")
+# Layer A — strict CSP detection
+CSP_PRESENT = re.compile(r"Content-Security-Policy|contentSecurityPolicy", re.I)
+CSP_SCRIPT_SELF = re.compile(r"script-src[^;'\"]*'self'", re.I)
+CSP_NONCE = re.compile(r"'nonce-|nonce-\$\{|\bstrict-dynamic\b", re.I)
+CSP_UNSAFE = re.compile(r"'unsafe-(?:inline|eval)'", re.I)
+# Layer B — out-of-band trust anchor detection
+OOB_ANCHOR = re.compile(
+    r"safety[_-]?code|safetyCode|fingerprint|identicon|blockie|jazzicon|emoji[_-]?code"
+    r"|out[_-]of[_-]band|toChecksumAddress|getAddress\(|checksumAddress|\beip[_-]?55\b|verifyAddress"
+    r"|address[_-]?verif|verif\w*[_-]?address|sendVerificationEmail|canonical[_-]?address", re.I)
+class ClientIntegrityExtractor(Extractor):
+    name = "client_integrity"
+    category = "exposure"
+    def extract(self, ctx: RepoContext, facts: dict) -> dict:
+        sensitive, qr_files, clip_files = [], [], []
+        csp_present = csp_self = csp_nonce = csp_unsafe = False
+        oob = []
+        for _p, rel, text in ctx.iter_code():
+            if SENSITIVE_VALUE.search(text):
+                if len(sensitive) < 30:
+                    sensitive.append(rel)
+            if QR_SIGNAL.search(text) and len(qr_files) < 30:
+                qr_files.append(rel)
+            if CLIPBOARD.search(text) and len(clip_files) < 30:
+                clip_files.append(rel)
+            if CSP_PRESENT.search(text):
+                csp_present = True
+                if CSP_SCRIPT_SELF.search(text):
+                    csp_self = True
+                if CSP_NONCE.search(text):
+                    csp_nonce = True
+                if CSP_UNSAFE.search(text):
+                    csp_unsafe = True
+            if OOB_ANCHOR.search(text) and len(oob) < 20:
+                oob.append(rel)
+        # strict = a real `script-src 'self'` (+ a nonce / strict-dynamic) with NO unsafe-inline/eval
+        strict_csp = bool(csp_present and csp_self and csp_nonce and not csp_unsafe)
+        out_of_band = bool(oob)
+        findings = []
+        present = bool(sensitive)
+        if present:
+            shown = ", ".join(sorted(set(sensitive))[:5])
+            if not strict_csp:
+                why = ("no Content-Security-Policy found" if not csp_present
+                       else "CSP allows 'unsafe-inline'/'unsafe-eval' in script-src" if csp_unsafe
+                       else "CSP present but not a strict script-src 'self' + nonce policy")
+                findings.append({
+                    "severity": "MEDIUM", "confidence": "LOW", "attack_class": "tamperable-display",
+                    "issue": "security-critical value rendered client-side without a strict CSP",
+                    "detail": f"This app renders a fund-redirecting value ({shown}) but {why}. A poisoned "
+                              "dependency or injected script (man-in-the-browser) can then rewrite the "
+                              "displayed/copied address or swap the QR for EVERY user at once (the scalable "
+                              "vector). Add Layer A: `script-src 'self'` + per-request nonce + `strict-dynamic`, "
+                              "no unsafe-inline/eval, object-src 'none'. (Ship report-only first to avoid "
+                              "breaking wallet SDKs, then enforce.)"})
+            if not out_of_band:
+                findings.append({
+                    "severity": "LOW", "confidence": "LOW", "attack_class": "tamperable-display",
+                    "issue": "no out-of-band trust anchor for the displayed address",
+                    "detail": f"No second, browser-independent source of truth was found for {shown} "
+                              "(emailed canonical address, a short safety code / fingerprint, a server-rendered "
+                              "identicon, or an EIP-55 checksum). Without one, a single-surface tamper is "
+                              "undetectable by the user. Add Layer B: anchor trust OFF the browser surface so "
+                              "the user can cross-check. NOTE: on-screen display can never be made "
+                              "cryptographically tamper-proof on the web — the goal is detectable, not "
+                              "impossible (the limit that hardware wallets exist to solve)."})
+        return {
+            "sensitive_display": sorted(set(sensitive)),
+            "qr_generation": sorted(set(qr_files)),
+            "clipboard_copy": sorted(set(clip_files)),
+            "strict_csp": strict_csp,
+            "csp_present": csp_present,
+            "csp_has_unsafe": csp_unsafe,
+            "out_of_band_anchor": out_of_band,
+            "anchors_found": sorted(set(oob)),
+            "findings": findings,
+            "note": ("Renders fund-redirecting value(s) — review man-in-the-browser exposure: strict CSP (kill the "
+                     "scalable vector) + an out-of-band anchor (make tamper detectable). This is the inherent "
+                     "web-platform limit; treat as architectural, LOW-confidence." if present else
+                     "No security-critical display values detected — MITB/tamperable-display class N/A."),
+        }

websec-validator 0.2.8__tar.gz → 0.3.0__tar.gz

websec-validator 0.2.8tar.gz → 0.3.0tar.gz