PyPI - websec-validator - Versions diffs - 0.3.0__tar.gz → 0.4.0__tar.gz - Mend

websec-validator 0.3.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (74) hide show

{websec_validator-0.3.0/src/websec_validator.egg-info → websec_validator-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: websec-validator
-Version: 0.3.0
+Version: 0.4.0
 Summary: Local-first security recon that briefs your AI coding agent: facts + tailored probe scripts, code-in / artifacts-out. No LLM, no server, no running app.
 Author: Ricardo Accioly
 License: MIT
@@ -82,7 +82,7 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 > That's the whole user surface: **`run`** (plus the optional, advanced **`dynamic`** live-probing step below). `recon`/`proof`/`calibrate` exist for developing the tool itself and are hidden from `--help` — you never need them.
-## What it extracts (13 deterministic extractors, no LLM)
+## What it extracts (15 deterministic extractors, no LLM)
 | | Dimension | Notable output |
 |---|---|---|
@@ -91,13 +91,15 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 | auth | scheme + login surface + **insecure-default signing secrets** | multi-scheme; flags a hard-coded `JWT_SECRET \|\| 'dev-secret'` fallback (forgeable JWT) |
 | **authz** | access-control map | guard coverage + **write endpoints with no visible guard** + roles |
 | tenant | multi-tenancy key candidates | the BOLA boundary, by frequency |
-| **password_policy** | cross-route policy consistency | flags a route enforcing fewer character classes than the strongest sibling (policy drift) |
-| surface | 14 sink classes | 12 user-input-gated (SSRF/SQLi/traversal/SSTI/…) **+ var-arg SSRF + response-side error-disclosure** |
+| **password_policy** | cross-route consistency **+ reuse/history** | complexity drift across routes **+ a set-password path that hashes without a reuse check** |
+| surface | 14 sink classes **+ redirect-SSRF** | user-input-gated sinks + var-arg SSRF + error-disclosure **+ follows-redirects-without-per-hop-guard** |
+| **upload_security** | unrestricted upload + unsafe serve | deny-list-only, stored-name-from-filename, trust-client-MIME, accept-SVG, **serve without `nosniff`** |
 | schemas | data models + **privileged fields** | Pydantic/SQLAlchemy/Django/Prisma/Mongoose/TypeORM/Zod → `role`/`isAdmin`/`groupId` for mass-assignment targeting |
-| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, Dockerfile-root, tfstate **+ CDK AppSync `API_KEY` default-auth (CSWSH)** |
+| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, tfstate, **CDK AppSync `API_KEY` anonymous-default-auth + WAF-as-control smell** |
 | client_exposure | browser leakage | public-var secrets by **name + value-shape (`da2-…`) + CDK build-injection**, server-secret-in-client, source maps |
-| **client_integrity** | tamperable display (man-in-the-browser) | a fund-redirecting value (wallet address/QR) shown without a strict CSP / out-of-band anchor |
-| graphql | GraphQL surface | introspection / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA) + WAF-bypass-aware introspection** |
+| **client_integrity** | tamperable display + **WS auth model** | wallet value without strict CSP / out-of-band anchor **+ the CSWSH determinant (ambient-cookie WS auth)** |
+| **pii_exposure** | unmasked PII at the output boundary | `res.json(rawEntity)` with PII + **a masking control defined but with zero live call sites** (value-shape, not field-name) |
+| graphql | GraphQL surface | introspection (**AppSync `introspectionConfig: DISABLED`-aware**) / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA)** |
 | integrations | third-party + webhooks | webhooks missing signature verification |
 Plus **derived targeting** — IDOR / SSRF / open-redirect / upload / write / auth-endpoint
@@ -206,13 +208,15 @@ publisher** with project `websec-validator`, owner `raccioly`, repo `websec-vali
 ## Status / roadmap
-**Done:** 13-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
-managed-AppSync / VTL boundary** — CSWSH, cross-group subscription BOLA, forgeable-JWT default
-secrets — and a **man-in-the-browser / tamperable-display** class), cross-tool de-dup + **bundled
-Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger with **calibrated
-confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all scanners + Noir,
-arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA — validated live,
-reproduced a hand-pentest's 14/14).
+**Done:** 15-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
+managed-AppSync / VTL boundary**, **upload-security** + **PII-output-boundary** + **redirect-SSRF**
++ **password-reuse** classes, and a **man-in-the-browser / tamperable-display** class), cross-tool
+de-dup + **bundled Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger
+with **calibrated confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all
+scanners + Noir, arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA —
+validated live, reproduced a hand-pentest's 14/14). Validated against the **PTREQ0013000 pen test +
+retest** (incl. correcting two findings the retest disproved: AppSync introspection *is* disablable
+engine-level, and API_KEY-default is anonymous-auth, not CSWSH).
 **Next:** dynamic write-verb BOLA + JWT/auth probes + ZAP/Nuclei two-role diff (gated, they mutate),
 calibration on hand-labeled real repos (more representative base rate), ASVS index lookup, optional
 model-SDK adapters for no-agent fallback.

{websec_validator-0.3.0 → websec_validator-0.4.0}/README.md RENAMED Viewed

@@ -70,7 +70,7 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 > That's the whole user surface: **`run`** (plus the optional, advanced **`dynamic`** live-probing step below). `recon`/`proof`/`calibrate` exist for developing the tool itself and are hidden from `--help` — you never need them.
-## What it extracts (13 deterministic extractors, no LLM)
+## What it extracts (15 deterministic extractors, no LLM)
 | | Dimension | Notable output |
 |---|---|---|
@@ -79,13 +79,15 @@ Then point your agent at the output: **"Read `websec-out/AGENT-BRIEFING.md` and
 | auth | scheme + login surface + **insecure-default signing secrets** | multi-scheme; flags a hard-coded `JWT_SECRET \|\| 'dev-secret'` fallback (forgeable JWT) |
 | **authz** | access-control map | guard coverage + **write endpoints with no visible guard** + roles |
 | tenant | multi-tenancy key candidates | the BOLA boundary, by frequency |
-| **password_policy** | cross-route policy consistency | flags a route enforcing fewer character classes than the strongest sibling (policy drift) |
-| surface | 14 sink classes | 12 user-input-gated (SSRF/SQLi/traversal/SSTI/…) **+ var-arg SSRF + response-side error-disclosure** |
+| **password_policy** | cross-route consistency **+ reuse/history** | complexity drift across routes **+ a set-password path that hashes without a reuse check** |
+| surface | 14 sink classes **+ redirect-SSRF** | user-input-gated sinks + var-arg SSRF + error-disclosure **+ follows-redirects-without-per-hop-guard** |
+| **upload_security** | unrestricted upload + unsafe serve | deny-list-only, stored-name-from-filename, trust-client-MIME, accept-SVG, **serve without `nosniff`** |
 | schemas | data models + **privileged fields** | Pydantic/SQLAlchemy/Django/Prisma/Mongoose/TypeORM/Zod → `role`/`isAdmin`/`groupId` for mass-assignment targeting |
-| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, Dockerfile-root, tfstate **+ CDK AppSync `API_KEY` default-auth (CSWSH)** |
+| iac_ci | IaC + CI/CD | GHA injection, unpinned actions, tfstate, **CDK AppSync `API_KEY` anonymous-default-auth + WAF-as-control smell** |
 | client_exposure | browser leakage | public-var secrets by **name + value-shape (`da2-…`) + CDK build-injection**, server-secret-in-client, source maps |
-| **client_integrity** | tamperable display (man-in-the-browser) | a fund-redirecting value (wallet address/QR) shown without a strict CSP / out-of-band anchor |
-| graphql | GraphQL surface | introspection / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA) + WAF-bypass-aware introspection** |
+| **client_integrity** | tamperable display + **WS auth model** | wallet value without strict CSP / out-of-band anchor **+ the CSWSH determinant (ambient-cookie WS auth)** |
+| **pii_exposure** | unmasked PII at the output boundary | `res.json(rawEntity)` with PII + **a masking control defined but with zero live call sites** (value-shape, not field-name) |
+| graphql | GraphQL surface | introspection (**AppSync `introspectionConfig: DISABLED`-aware**) / playground / depth-limit **+ AppSync subscription-authz (cross-group BOLA)** |
 | integrations | third-party + webhooks | webhooks missing signature verification |
 Plus **derived targeting** — IDOR / SSRF / open-redirect / upload / write / auth-endpoint
@@ -194,13 +196,15 @@ publisher** with project `websec-validator`, owner `raccioly`, repo `websec-vali
 ## Status / roadmap
-**Done:** 13-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
-managed-AppSync / VTL boundary** — CSWSH, cross-group subscription BOLA, forgeable-JWT default
-secrets — and a **man-in-the-browser / tamperable-display** class), cross-tool de-dup + **bundled
-Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger with **calibrated
-confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all scanners + Noir,
-arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA — validated live,
-reproduced a hand-pentest's 14/14).
+**Done:** 15-extractor recon (incl. schema/entity → mass-assignment targeting, the **AWS-CDK /
+managed-AppSync / VTL boundary**, **upload-security** + **PII-output-boundary** + **redirect-SSRF**
++ **password-reuse** classes, and a **man-in-the-browser / tamperable-display** class), cross-tool
+de-dup + **bundled Semgrep rules**, tailored probe staging, agent briefing, traceable findings ledger
+with **calibrated confidence (CJE — Wilson CIs)**, proof harness, test suite, **Docker bundle** (all
+scanners + Noir, arch-aware), **dynamic phase v1** (authenticated read-only cross-tenant BOLA —
+validated live, reproduced a hand-pentest's 14/14). Validated against the **PTREQ0013000 pen test +
+retest** (incl. correcting two findings the retest disproved: AppSync introspection *is* disablable
+engine-level, and API_KEY-default is anonymous-auth, not CSWSH).
 **Next:** dynamic write-verb BOLA + JWT/auth probes + ZAP/Nuclei two-role diff (gated, they mutate),
 calibration on hand-labeled real repos (more representative base rate), ASVS index lookup, optional
 model-SDK adapters for no-agent fallback.

{websec_validator-0.3.0 → websec_validator-0.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "websec-validator"
-version = "0.3.0"
+version = "0.4.0"
 description = "Local-first security recon that briefs your AI coding agent: facts + tailored probe scripts, code-in / artifacts-out. No LLM, no server, no running app."
 readme = "README.md"
 requires-python = ">=3.11"

{websec_validator-0.3.0 → websec_validator-0.4.0}/src/websec_validator/briefing.py RENAMED Viewed

@@ -69,6 +69,20 @@ def render(facts: dict, scanners: dict, scan_results: list, probe_manifest: list
         pp_line = f"looks consistent across {len(pp['password_blocks'])} validator block(s)"
     else:
         pp_line = "_no password validators detected_"
+    if ((pp.get("password_reuse") or {}).get("gap")):
+        pp_line += "  ·  ⚠ NO reuse/history control (#6)"
+    up = facts.get("upload_security", {})
+    up_findings = up.get("findings", [])
+    up_section = ("\n".join(f"- **{f.get('severity')}** {f.get('kind')} — `{f.get('file')}`" for f in up_findings[:20])
+                  if up_findings else
+                  ("_upload handler(s) present; allow-list + nosniff look ok — spot-check_" if up.get("upload_handlers")
+                   else "_no upload handlers detected_"))
+    pii = facts.get("pii_exposure", {})
+    pii_findings = pii.get("findings", [])
+    pii_section = ("\n".join(f"- **{f.get('severity')}** {f.get('kind')} — `{f.get('file')}`" for f in pii_findings[:20])
+                   if pii_findings else "_no obvious raw-PII responses / dead masking controls_")
+    ws_line = (facts.get("client_integrity", {}) or {}).get("websocket_auth", "no websocket detected")
     gql = facts.get("graphql", {})
     if gql.get("present"):
@@ -181,6 +195,14 @@ Production source maps exposed: {client.get("production_source_maps", False)}
 **Client integrity — man-in-the-browser / tamperable display:**
 {ci_section}
+**WebSocket auth model (CSWSH determinant — is it an ambient cookie?):** {ws_line}
+**File-upload security (#2b — sniff bytes, derive stored name, nosniff on serve):**
+{up_section}
+**PII output boundary (#8 — verify by VALUE SHAPE, not field name):**
+{pii_section}
 **Third-party integrations:** {integ_line}
 {wh_line}

{websec_validator-0.3.0 → websec_validator-0.4.0}/src/websec_validator/extractors/__init__.py RENAMED Viewed

@@ -17,12 +17,14 @@ from .client_integrity import ClientIntegrityExtractor
 from .graphql import GraphQLExtractor
 from .iac_ci import IacCiExtractor
 from .integrations import IntegrationsExtractor
+from .pii_exposure import PiiExposureExtractor
 from .policy_consistency import PolicyConsistencyExtractor
 from .routes import RoutesExtractor
 from .schemas import SchemasExtractor
 from .stack import StackExtractor
 from .surface import SurfaceExtractor
 from .tenant import TenantExtractor
+from .upload_security import UploadSecurityExtractor
 # Order matters: stack first (others read facts['stack']); authz after routes
 # (reads facts['routes']).
@@ -34,10 +36,12 @@ REGISTRY: list[Extractor] = [
     TenantExtractor(),
     PolicyConsistencyExtractor(),
     SurfaceExtractor(),
+    UploadSecurityExtractor(),
     SchemasExtractor(),
     IacCiExtractor(),
     ClientExposureExtractor(),
     ClientIntegrityExtractor(),
+    PiiExposureExtractor(),
     GraphQLExtractor(),
     IntegrationsExtractor(),
 ]

{websec_validator-0.3.0 → websec_validator-0.4.0}/src/websec_validator/extractors/client_integrity.py RENAMED Viewed

@@ -48,6 +48,16 @@ OOB_ANCHOR = re.compile(
     r"|out[_-]of[_-]band|toChecksumAddress|getAddress\(|checksumAddress|\beip[_-]?55\b|verifyAddress"
     r"|address[_-]?verif|verif\w*[_-]?address|sendVerificationEmail|canonical[_-]?address", re.I)
+# WebSocket / realtime auth model — the CSWSH determinant (PTREQ0013000 #4). CSWSH is only
+# exploitable when the socket authenticates via an AMBIENT COOKIE the browser auto-attaches
+# cross-origin. A token placed in the connection payload / subprotocol and stored origin-scoped is
+# NOT exploitable (SOP blocks a cross-origin page from reading it). This lets us ANSWER a CSWSH
+# scanner flag instead of guessing — the retest pushed back on exactly this and won.
+WS_USAGE = re.compile(r"new\s+WebSocket\(|socket\.io|graphql-ws|subscriptions-transport-ws|appsync-realtime"
+                      r"|\bwss?://", re.I)
+WS_COOKIE_AUTH = re.compile(r"withCredentials\s*:\s*true|credentials\s*:\s*['\"]include['\"]"
+                            r"|document\.cookie[\s\S]{0,80}?(?:socket|ws\b|websocket)", re.I)
 class ClientIntegrityExtractor(Extractor):
     name = "client_integrity"
@@ -57,6 +67,7 @@ class ClientIntegrityExtractor(Extractor):
         sensitive, qr_files, clip_files = [], [], []
         csp_present = csp_self = csp_nonce = csp_unsafe = False
         oob = []
+        ws_usage = ws_cookie = False
         for _p, rel, text in ctx.iter_code():
             if SENSITIVE_VALUE.search(text):
                 if len(sensitive) < 30:
@@ -75,10 +86,15 @@ class ClientIntegrityExtractor(Extractor):
                     csp_unsafe = True
             if OOB_ANCHOR.search(text) and len(oob) < 20:
                 oob.append(rel)
+            if WS_USAGE.search(text):
+                ws_usage = True
+            if WS_COOKIE_AUTH.search(text):
+                ws_cookie = True
         # strict = a real `script-src 'self'` (+ a nonce / strict-dynamic) with NO unsafe-inline/eval
         strict_csp = bool(csp_present and csp_self and csp_nonce and not csp_unsafe)
         out_of_band = bool(oob)
+        ws_cookie_auth = bool(ws_usage and ws_cookie)   # the CSWSH determinant (ambient-cookie WS auth)
         findings = []
         present = bool(sensitive)
@@ -109,8 +125,24 @@ class ClientIntegrityExtractor(Extractor):
                               "cryptographically tamper-proof on the web — the goal is detectable, not "
                               "impossible (the limit that hardware wallets exist to solve)."})
+        # CSWSH is ONLY real when the WS auth is an ambient cookie (PTREQ0013000 #4). This lets us
+        # answer a CSWSH scanner flag instead of guessing — a bearer token in the payload is not it.
+        if ws_cookie_auth:
+            findings.append({
+                "severity": "MEDIUM", "confidence": "LOW", "attack_class": "cswsh",
+                "issue": "WebSocket authenticated via an ambient cookie (Cross-Site WebSocket Hijacking)",
+                "detail": "A WebSocket/realtime connection appears to authenticate via a cookie "
+                          "(withCredentials / credentials:'include'), which the browser auto-attaches "
+                          "cross-origin — so a page on any origin can open an authenticated socket (CSWSH, #4). "
+                          "Validate the Origin on the handshake, or move the credential into the connection "
+                          "payload / subprotocol and store it origin-scoped (not a cookie). If WS auth is "
+                          "already a token in the payload, CSWSH is NOT exploitable."})
         return {
             "sensitive_display": sorted(set(sensitive)),
+            "websocket_auth": ("cookie (CSWSH-exposed — validate Origin)" if ws_cookie_auth
+                               else "token-or-none (CSWSH not exploitable)" if ws_usage
+                               else "no websocket detected"),
             "qr_generation": sorted(set(qr_files)),
             "clipboard_copy": sorted(set(clip_files)),
             "strict_csp": strict_csp,

{websec_validator-0.3.0 → websec_validator-0.4.0}/src/websec_validator/extractors/graphql.py RENAMED Viewed

@@ -33,6 +33,11 @@ TENANT_ARG = re.compile(r"\b(\w+)\s*\(([^)]*\b(?:groupId|group_id|orgId|org_id|t
 # Identity-binding signals in a VTL resolver — the field is tied to the CALLER, not a free arg.
 VTL_AUTHZ = re.compile(r"\$ctx(?:tx)?\.identity|\$context\.identity|identity\.(?:sub|username|claims|resolverContext)"
                        r"|util\.unauthorized|\bgroupIds?\b[\s\S]{0,80}?\bcontains\b|#if\s*\(\s*!?\s*\$ctx\.identity")
+# Engine-level introspection disable on aws-cdk-lib appsync.GraphqlApi. The PTREQ0013000 RETEST
+# proved this IS available and un-bypassable (unlike a WAF string-match) — so a correctly-configured
+# AppSync API must NOT be flagged. This corrects the 0.3.0 false positive that always cried wolf.
+APPSYNC_INTROSPECTION_OFF = re.compile(r"introspectionConfig\s*:\s*[\w.]*\bDISABLED\b")
+APPSYNC_LIMITING = re.compile(r"\bqueryDepthLimit\b|\bresolverCountLimit\b")
 class GraphQLExtractor(Extractor):
@@ -47,10 +52,15 @@ class GraphQLExtractor(Extractor):
         introspection, playground, limiting, code_hit = "unknown", False, False, False
         appsync, aws_directives = False, False
+        appsync_introspection_off = appsync_limiting = False
         schema_texts = []          # (rel, text) for SDL files — parsed for Subscription authz
         for _p, rel, text in ctx.iter_code():
             if APPSYNC_MARK.search(text):
                 appsync = True
+            if APPSYNC_INTROSPECTION_OFF.search(text):
+                appsync_introspection_off = True
+            if APPSYNC_LIMITING.search(text):
+                appsync_limiting = True
             if rel.endswith((".graphql", ".gql")):
                 schema_texts.append((rel, text))
                 if AWS_AUTH_DIRECTIVE.search(text):
@@ -74,14 +84,20 @@ class GraphQLExtractor(Extractor):
         findings = []
         sub_authz = []
         if managed:
-            # AppSync exposes introspection and it is NOT disablable at the API layer (no Apollo-style
-            # `introspection:false`). The report's #2 proved the WAF that "blocks" it is bypassable.
-            findings.append({"severity": "MEDIUM", "issue": "AppSync GraphQL introspection reachable",
-                             "attack_class": "graphql",
-                             "detail": "AppSync exposes schema introspection; it can't be disabled at the API layer. "
-                                       "If a WAF blocks the keyword, that string-match is bypassable via Unicode-escape "
-                                       "/ junk-byte padding (PTREQ0013000 #2). Enforce field-level @aws_* auth + run the "
-                                       "appsync-introspection probe (it attempts the bypass) — don't rely on the WAF."})
+            # AppSync introspection CAN be disabled engine-level via
+            # `introspectionConfig: IntrospectionConfig.DISABLED` (aws-cdk-lib) — un-bypassable, unlike
+            # a WAF byte-match. Only flag when it is NOT disabled (retest correction to the 0.3.0 FP).
+            if not appsync_introspection_off:
+                findings.append({"severity": "MEDIUM", "issue": "AppSync GraphQL introspection not disabled engine-level",
+                                 "attack_class": "graphql",
+                                 "detail": "Set `introspectionConfig: appsync.IntrospectionConfig.DISABLED` so the engine "
+                                           "rejects __schema/__type regardless of encoding. A WAF byte-match on `__schema` "
+                                           "is NOT sufficient — bypassable via Unicode/JSON escapes and it only fronts one "
+                                           "endpoint (PTREQ0013000 #2). Run the appsync-introspection probe to confirm."})
+            if not (appsync_limiting or limiting):
+                findings.append({"severity": "LOW", "issue": "AppSync has no query depth / resolver-count limit",
+                                 "attack_class": "graphql",
+                                 "detail": "add `queryDepthLimit` + `resolverCountLimit` (alias / deep-query DoS guard)."})
             sub_authz = self._subscription_authz(ctx, schema_texts, findings)
         else:
             if introspection in ("enabled", "unknown"):
@@ -103,7 +119,8 @@ class GraphQLExtractor(Extractor):
                              or (["AppSync GraphQL API (HTTP + realtime WebSocket)"] if managed
                                  else ["(server detected; endpoint not routed by Noir)"]),
                 "schema_files": schema_files[:20],
-                "introspection": "appsync-reachable" if managed else introspection,
+                "introspection": (("appsync-disabled" if appsync_introspection_off else "appsync-reachable")
+                                  if managed else introspection),
                 "playground_enabled": playground, "query_limiting_detected": limiting,
                 "subscription_authz": sub_authz,
                 "findings": findings,

{websec_validator-0.3.0 → websec_validator-0.4.0}/src/websec_validator/extractors/iac_ci.py RENAMED Viewed

@@ -29,6 +29,12 @@ APPSYNC_DEFAULT_APIKEY = re.compile(
 APPSYNC_APIKEY_MODE = re.compile(r"AuthorizationType\.API_KEY|authorizationType\s*:\s*['\"]?API_KEY")
 WAFV2 = re.compile(r"wafv2\.CfnWebACL|\bCfnWebACL\b|aws_wafv2|wafv2\.CfnWebACLAssociation")
 WAF_ASSOC = re.compile(r"CfnWebACLAssociation|WebACLAssociation")
+# WAF used as the PRIMARY control for an app-layer flaw — a bypassable band-aid, not a remediation
+# (PTREQ0013000 #2/#11). A byteMatchStatement/regex matching `__schema`, SQL keywords or `<script`
+# means the app-layer bug is still there; the string-match is evadable via encoding + only one door.
+WAF_APPLAYER_MATCH = re.compile(
+    r"(?:byteMatchStatement|searchString|RegexPatternSet|regexString)[\s\S]{0,220}?"
+    r"(__schema|__type|UNION\s+SELECT|information_schema|<script|onerror=|\bor\s+1\s*=\s*1\b|sleep\s*\()", re.I)
 class IacCiExtractor(Extractor):
@@ -69,7 +75,7 @@ class IacCiExtractor(Extractor):
             findings.append({"severity": "HIGH", "kind": "terraform-state-committed", "file": ctx.rel(tf),
                              "detail": "tfstate may contain plaintext secrets (DB passwords, keys) — must not be committed"})
-        # --- CDK / managed-AppSync auth (#4 CSWSH; surfaces the #2/#5 boundary) ---
+        # --- CDK / managed-AppSync auth (#4 anonymous default-auth; WAF-as-control smell #2) ---
         appsync_files, waf_present, waf_assoc = [], False, False
         for _p, rel, text in ctx.iter_code():
             if not rel.endswith((".ts", ".js", ".mjs", ".cjs")):
@@ -78,15 +84,24 @@ class IacCiExtractor(Extractor):
                 waf_present = True
             if WAF_ASSOC.search(text):
                 waf_assoc = True
+            if WAF_APPLAYER_MATCH.search(text):
+                tok = (WAF_APPLAYER_MATCH.search(text).group(1) or "").strip()
+                findings.append({"severity": "MEDIUM", "kind": "waf-as-app-control", "file": rel,
+                                 "detail": f"A WAF string/regex match on an app-layer attack token ({tok!r}) is used as a "
+                                           "control. A WAF is a bypassable compensating control, never the remediation: "
+                                           "string-matches are evaded by encoding (the retest bypassed `__schema` with a "
+                                           "Unicode escape) and only cover one endpoint. Fix at the app/engine layer "
+                                           "(disable introspection, parametrize queries) and keep the WAF as defense-in-depth."})
             if APPSYNC_API.search(text):
                 appsync_files.append(rel)
                 if APPSYNC_DEFAULT_APIKEY.search(text):
                     findings.append({"severity": "HIGH", "kind": "appsync-apikey-default", "file": rel,
-                                     "detail": "AppSync defaultAuthorization is API_KEY — the realtime WebSocket "
-                                               "accepts a static key with no Origin/cookie binding (Cross-Site "
-                                               "WebSocket Hijacking + anonymous subscribe). Make the default "
-                                               "USER_POOL/OIDC/IAM/LAMBDA; keep API_KEY (if needed) as a scoped "
-                                               "additional mode only."})
+                                     "detail": "AppSync defaultAuthorization is API_KEY — the API (HTTP + realtime) accepts "
+                                               "a static key by default, and that key typically ships to the browser, so "
+                                               "this is effectively ANONYMOUS/unauthenticated access. Make the default "
+                                               "USER_POOL/OIDC/IAM/LAMBDA; keep API_KEY (if needed) to a scoped additional "
+                                               "mode. (NB: this is NOT in itself CSWSH — that needs cookie-based WS auth; "
+                                               "see the client_integrity websocket-auth check.)"})
                 elif APPSYNC_APIKEY_MODE.search(text):
                     findings.append({"severity": "MEDIUM", "kind": "appsync-apikey-mode", "file": rel,
                                      "detail": "AppSync accepts an API_KEY authorization mode — confirm it is NOT the "

websec_validator-0.4.0/src/websec_validator/extractors/pii_exposure.py ADDED Viewed

@@ -0,0 +1,98 @@
+"""PII output-boundary extractor — unmasked customer data in API responses (PTREQ0013000 #8).
+Two high-signal static tells the retest taught us:
+1. **Dead security control.** A masking helper / `view_full`-style permission EXISTS in the codebase
+   but has ZERO call sites in the live request handlers — it was wired only into offline export paths.
+   A control defined-but-never-called is worse than none (it reads as "handled"). This is very
+   distinctive and cheap to find: collect `mask*/redact*/canViewFull*` definitions, count live (non-
+   test) call sites, flag the ones with none.
+2. **Raw entity to the client.** A controller does `res.json(entity)` on a raw ORM/repo object that
+   carries PII fields, with no DTO/serializer/masker — so phone/email ship in cleartext, *including*
+   indirect carriers (a phone embedded in a composed `messageBirdId`, a denormalized `lastMessage`).
+   The decisive verification is **value-shape, not field-name** — a field allow-list misses the
+   indirect carriers — so the probe asserts no phone/email *value* reaches a non-privileged caller.
+"""
+from __future__ import annotations
+import re
+from .base import Extractor, RepoContext
+# helper/permission DEFINITIONS (function/arrow/def) — not variable assignments to a call result
+MASK_DEF = re.compile(
+    r"(?:function\s+|export\s+(?:async\s+)?function\s+|def\s+)"
+    r"(mask\w+|redact\w+|canViewFull\w+|scrub\w+|anonymi[sz]e\w+|toPublic\w+|sanitize\w*Pii)\b"
+    r"|(?:const|let|export\s+const)\s+(mask\w+|redact\w+|canViewFull\w+|toPublic\w+)\s*=\s*(?:async\s*)?\(", re.I)
+PII_FIELD = re.compile(r"\b(?:phone|phoneNumber|msisdn|mobile|email|emailAddress|ssn|socialSecurity"
+                       r"|dob|dateOfBirth|birthDate|creditCard|cardNumber|taxId|nationalId)\b", re.I)
+# returning a raw variable / a fresh ORM read straight to the client
+RES_RAW = re.compile(r"res\.(?:json|send)\s*\(\s*(?:await\s+)?[A-Za-z_$][\w$]*\s*\)"
+                     r"|res\.(?:json|send)\s*\(\s*await\s+[\w.]+\.(?:find|findOne|findById|findAll|get|query)\s*\(")
+MASK_CALL_NEAR = re.compile(r"mask\w+\(|redact\w+\(|toPublic\w+\(|canViewFull\w+\(|\.serialize\(|toDto\(|\bDTO\b|pick\(", re.I)
+TESTFILE = re.compile(r"(?:^|/)(?:tests?|__tests__|spec)/|\.(?:test|spec)\.", re.I)
+class PiiExposureExtractor(Extractor):
+    name = "pii_exposure"
+    category = "exposure"
+    def extract(self, ctx: RepoContext, facts: dict) -> dict:
+        texts = []
+        helpers: dict = {}      # name -> def file
+        for _p, rel, text in ctx.iter_code():
+            texts.append((rel, text))
+            for m in MASK_DEF.finditer(text):
+                nm = m.group(1) or m.group(2)
+                if nm and len(nm) > 4 and nm not in helpers:
+                    helpers[nm] = rel
+        findings = []
+        # 1. dead masking/permission control — defined but no LIVE (non-test) call site
+        dead = []
+        for nm, deffile in helpers.items():
+            callrx = re.compile(r"\b" + re.escape(nm) + r"\s*\(")
+            live = sum(1 for rel, text in texts
+                       if rel != deffile and not TESTFILE.search(rel) and callrx.search(text))
+            if live == 0:
+                dead.append(nm)
+                findings.append({"severity": "HIGH", "kind": "dead-pii-control", "file": deffile,
+                                 "detail": f"`{nm}` (a masking/PII-permission control) is defined but has NO live "
+                                           "call site outside its own file/tests — a security control that exists but "
+                                           "isn't wired into the request handlers (it was likely only on export/report "
+                                           "paths). Apply it at the live API output boundary, or remove the false "
+                                           "sense of safety (PTREQ0013000 #8)."})
+        # 2. raw entity with PII to the client, no masker/DTO in the handler
+        raw_leaks = []
+        for rel, text in texts:
+            if TESTFILE.search(rel):
+                continue
+            if PII_FIELD.search(text) and RES_RAW.search(text) and not MASK_CALL_NEAR.search(text):
+                if len(raw_leaks) < 30:
+                    raw_leaks.append(rel)
+                    findings.append({"severity": "MEDIUM", "kind": "raw-entity-pii-response", "file": rel,
+                                     "detail": "A handler returns a raw entity (`res.json(entity)`) in a file that "
+                                               "handles PII fields, with no DTO/serializer/masker — phone/email likely "
+                                               "ship in cleartext. Mask at ONE output boundary (a DTO), gated by a "
+                                               "permission. VERIFY BY VALUE SHAPE (no phone/email value in the JSON), "
+                                               "not field name — indirect carriers (composed IDs, denormalized fields) "
+                                               "leak too (the `messageBirdId`-embeds-the-phone class, #8)."})
+        by_sev: dict = {}
+        for f in findings:
+            by_sev[f["severity"]] = by_sev.get(f["severity"], 0) + 1
+        return {
+            "findings": findings,
+            "dead_controls": dead,
+            "raw_pii_responses": raw_leaks,
+            "masking_helpers": sorted(helpers.keys())[:20],
+            "by_severity": by_sev,
+            "note": ("PII output-boundary review: " + (f"{len(dead)} masking control(s) defined but unused; " if dead else "")
+                     + (f"{len(raw_leaks)} handler(s) return a raw PII entity. " if raw_leaks else "no obvious raw-PII responses. ")
+                     + "Probe with a per-role response diff asserting NO phone/email VALUE (/\\+?\\d{7,}/ or an email "
+                       "regex) reaches a non-privileged caller — across nested objects, IDs, and exports (#8)."),
+        }

{websec_validator-0.3.0 → websec_validator-0.4.0}/src/websec_validator/extractors/policy_consistency.py RENAMED Viewed

@@ -42,6 +42,17 @@ _RE_STRONG = re.compile(r"isStrongPassword", re.I)
 _ALL = ("min", "upper", "lower", "digit", "special")
+# Password REUSE / history — a DIFFERENT control from complexity (PTREQ0013000 #6, which we initially
+# misread as complexity). A set-password path that hashes a new password with no comparison to the
+# current / previous hashes lets a user re-set the same password. Signals:
+HASH_NEW = re.compile(r"bcrypt(?:js)?\.hash|argon2\.hash|\bscrypt\b|pbkdf2|hashPassword\(|\.setPassword\(", re.I)
+REUSE_CHECK = re.compile(r"isPasswordReused|passwordHistory|password_history|previousPasswords|prior[_-]?hashes"
+                         r"|bcrypt(?:js)?\.compare[\s\S]{0,200}?(?:history|previous|current|old)", re.I)
+PW_HASH_FIELD = re.compile(r"\b(?:passwordHash|password_hash|hashedPassword|pwdHash|passwordDigest)\b")
+PW_HISTORY_FIELD = re.compile(r"\b(?:passwordHistory|password_history|previousPasswords|passwordHistoryHashes|priorPasswords)\b")
+SET_PW_CTX = re.compile(r"changePassword|updatePassword|setPassword|resetPassword|updateProfile|adminUpdate"
+                        r"|set[_-]?password|change[_-]?password", re.I)
 def _classes(window: str) -> set:
     """The character-class requirement set enforced in one validation window."""
@@ -68,7 +79,23 @@ class PolicyConsistencyExtractor(Extractor):
     def extract(self, ctx: RepoContext, facts: dict) -> dict:
         blocks = []        # (file, frozenset(classes))
         seen = set()
+        hashes = reuse_check = set_ctx = model_pwhash = model_history = False
         for _p, rel, text in ctx.iter_code():
+            # Reuse signals live in camelCase compounds (changePassword/passwordHash) that PW_FIELD's
+            # \bword\b boundaries miss — so track them on a cheap substring pre-check, NOT behind the
+            # complexity gate below (that bug initially made the reuse check silently never fire).
+            low = text.lower()
+            if "password" in low or "bcrypt" in low or "argon2" in low or "scrypt" in low or "pbkdf2" in low:
+                if HASH_NEW.search(text):
+                    hashes = True
+                if REUSE_CHECK.search(text):
+                    reuse_check = True
+                if SET_PW_CTX.search(text):
+                    set_ctx = True
+                if PW_HASH_FIELD.search(text):
+                    model_pwhash = True
+                if PW_HISTORY_FIELD.search(text):
+                    model_history = True
             if not PW_FIELD.search(text):
                 continue
             # FORWARD-only window, capped at the next password field — validation follows the field
@@ -110,11 +137,22 @@ class PolicyConsistencyExtractor(Extractor):
             if len(smax) < 3:
                 weak_policy = strongest
+        # Password REUSE / history (#6) — the DIFFERENT control: a set-password path that hashes a new
+        # password with no reuse comparison, and/or a passwordHash model with no history field, lets a
+        # user re-set the same/old password. (Complexity is the drift check above; this is reuse.)
+        reuse_gap = bool(hashes and (set_ctx or model_pwhash) and not reuse_check and not model_history)
+        password_reuse = {
+            "hashes_passwords": hashes, "has_set_password_path": set_ctx,
+            "has_reuse_check": reuse_check, "model_has_passwordHash": model_pwhash,
+            "model_has_history": model_history, "gap": reuse_gap,
+        }
         return {
             "password_blocks": blocks[:20],
             "strongest_policy": strongest,
             "drift": drift,                    # MEDIUM in findings.py — inconsistent siblings (#6)
             "weak_policy": weak_policy,        # LOW — uniformly weak, no strong sibling to compare
+            "password_reuse": password_reuse,  # MEDIUM — no reuse/history control on set-password (#6)
             "consistent": not drift,
             "note": ("Password-policy DRIFT: a sibling route enforces fewer character classes than the "
                      "strongest one found — align them (the WU #6 regression). " if drift else

{websec_validator-0.3.0 → websec_validator-0.4.0}/src/websec_validator/extractors/surface.py RENAMED Viewed

@@ -67,6 +67,14 @@ SINKS = {
 }
+# SSRF-via-redirect (PTREQ0013000 #1): axios/requests FOLLOW redirects by DEFAULT, so an outbound
+# client on a variable URL re-validates only the FIRST hop unless it pins maxRedirects:0 or adds a
+# per-hop guard. One of these present = the chain is guarded; absent next to an SSRF sink = the lead
+# (allow-list on the input URL is necessary but never sufficient — a 302 to 169.254.169.254 wins).
+REDIRECT_GUARD = re.compile(r"beforeRedirect|maxRedirects\s*:\s*0\b|allow_redirects\s*=\s*False"
+                            r"|validateRedirect|isAllowed\w*Url|on[_-]?redirect|checkRedirect", re.I)
 class SurfaceExtractor(Extractor):
     name = "surface"
     category = "sinks"
@@ -78,6 +86,7 @@ class SurfaceExtractor(Extractor):
         found: dict = {k: [] for k in SINKS}
         counts: dict = {k: 0 for k in SINKS}
+        ssrf_redirect: list = []    # SSRF sink in a file with NO per-hop redirect guard (#1)
         for _p, rel, text in ctx.iter_code():
             for cls, (_probe, gate, rx) in SINKS.items():
                 if gate == "sql" and not has_sql:
@@ -88,12 +97,16 @@ class SurfaceExtractor(Extractor):
                     counts[cls] += 1
                     if len(found[cls]) < 60:
                         found[cls].append(rel)
+            if (len(ssrf_redirect) < 40 and not REDIRECT_GUARD.search(text)
+                    and (SINKS["ssrf-outbound-http"][2].search(text) or SINKS["ssrf"][2].search(text))):
+                ssrf_redirect.append(rel)
         sinks = {k: {"probe": SINKS[k][0], "count": counts[k], "files": found[k]}
                  for k in SINKS if counts[k]}
         return {
             "sinks": sinks,
             "sink_counts": {k: counts[k] for k in SINKS if counts[k]},
+            "ssrf_redirect_unguarded": ssrf_redirect,   # validate EVERY hop, not just the input URL (#1)
             "datastore_class": ("sql" if has_sql else ("nosql" if has_nosql else "unknown")),
             "note": "Each sink hit is user-input-gated (req./request./concat/interp), so these are "
                     "higher-confidence leads. Cross-reference the files with routes.targeting to pick "

websec-validator 0.3.0__tar.gz → 0.4.0__tar.gz

websec-validator 0.3.0tar.gz → 0.4.0tar.gz